Categorizing values in JavaScript

[2013-01-20] dev, javascript, advancedjs, jslang
(Ad, please don’t block)
This post examines four ways in which values can be categorized in JavaScript: via the hidden property [[Class]], via the typeof operator, via the instanceof operator and via the function Array.isArray(). We’ll also look at the prototype objects of built-in constructors, which produce unexpected categorization results.

[This post is a copy of my Adobe Developer Connection article, I’m publishing it here for archival purposes.]

Required knowledge

Before we can get started with the actual topic, we have to review some required knowledge.

Primitives versus objects

All values in JavaScript are either primitives or objects.

Primitives. The following values are primitive:

  • undefined
  • null
  • Booleans
  • Numbers
  • Strings
Primitives are immutable, you can’t add properties to them:
    > var str = "abc";
    > str.foo = 123;  // try to add property "foo"
    123
    > str.foo  // no change
    undefined
And primitives are compared by value, they are considered equal if they have the same content:
    > "abc" === "abc"
    true

Objects. All non-primitive values are objects. Objects are mutable:

    > var obj = {};
    > obj.foo = 123;  // try to add property "foo"
    123
    > obj.foo  // property "foo" has been added
    123
And objects are compared by reference. Each object has its own identity and two objects are only considered equal if they are, in fact, the same object:
    > {} === {}
    false

    > var obj = {};
    > obj === obj
    true
Wrapper object types. The primitive types boolean, number and string have the corresponding wrapper object types Boolean, Number and String. Instances of the latter are objects and different from the primitives that they are wrapping:
    > typeof new String("abc")
    'object'
    > typeof "abc"
    'string'
    > new String("abc") === "abc"
    false
Wrapper object types are rarely used directly, but their prototype objects define the methods of primitives. For example, String.prototype is the prototype object of the wrapper type String. All of its methods are also available for strings. Take the wrapper method String.prototype.indexOf. Primitive strings have the same method. Not a different method with the same name, literally the same method:
    > String.prototype.indexOf === "".indexOf
    true

Internal properties

Internal properties are properties that cannot be directly accessed from JavaScript, but influence how it works. The names of internal properties start with an upercase letter and are written in double square braces. An example: [[Extensible]] holds a boolean flag that determines whether or not properties can be added to an object. Its value can be manipulated indirectly: Object.isExtensible() reads its values, Object.preventExtensions() sets its value to false. Once it is false, there is no way to change its value to true.

Terminology: prototypes versus prototype objects

In JavaScript, the term prototype is unfortunately a bit overloaded:
  1. On one hand, there is the prototype-of relationship between objects. Each object has a hidden property [[Prototype]] that either points to its prototype or is null. The prototype is a continuation of the object. If a property is accessed and it can’t be found in the latter, the search continues in the former. Several objects can have the same prototype.
  2. On the other hand, if a type is implemented by a constructor Foo then that constructor has a property Foo.prototype that holds the type’s prototype object.
To make the distinction clear we call (1) “prototypes” and (2) “prototype objects”. Three methods help with dealing with prototypes:
  • Object.getPrototypeOf(obj) returns the prototype of obj:
        > Object.getPrototypeOf({}) === Object.prototype
        true
    
  • Object.create(proto) creates an empty object whose prototype is proto.
        > Object.create(Object.prototype)
        {}
    
    Object.create() can do more, but that is beyond the scope of this post.
  • proto.isPrototypeOf(obj) returns true if proto is a prototype of obj (or a prototype of a prototype, etc.).
        > Object.prototype.isPrototypeOf({})
        true
    

The property “constructor”

Given a constructor function Foo, the prototype object Foo.prototype has a property Foo.prototype.constructor that points back to Foo. That property is set up automatically for each function.
    > function Foo() { }
    > Foo.prototype.constructor === Foo
    true
    > RegExp.prototype.constructor === RegExp
    true
All instances of a constructor inherit that property from the prototype object. Thus, we can use it to determine which constructor created an instance:
    > new Foo().constructor
    [Function: Foo]
    > /abc/.constructor
    [Function: RegExp]

Categorizing values

Let’s look at four ways of categorizing values:
  • [[Class]] is an internal property with a string that classifies an object
  • typeof is an operator that categorizes primitives and helps distinguish them from objects
  • instanceof is an operator that categorizes objects
  • Array.isArray() is a function that determines whether a value is an array

[[Class]]

[[Class]] is an internal property whose value is one of the following strings:
"Arguments", "Array", "Boolean", "Date", "Error", "Function", "JSON", "Math", "Number", "Object", "RegExp", "String"
The only way to access it from JavaScript code is via the default toString() method, Object.prototype.toString(). That method is generic and returns
  • "[object Undefined]" if this is undefined,
  • "[object Null]" if this is null,
  • "[object " + obj.[[Class]] + "]" if this is an object obj.
  • A primitive is converted to an object and then handled like in the previous rule.
Examples:
    > Object.prototype.toString.call(undefined)
    '[object Undefined]'
    > Object.prototype.toString.call(Math)
    '[object Math]'
    > Object.prototype.toString.call({})
    '[object Object]'
Therefore, the following function returns the [[Class]] of a value x:
    function getClass(x) {
        var str = Object.prototype.toString.call(x);
        return /^\[object (.*)\]$/.exec(str)[1];
    }
Here is that function in action:
    > getClass(null)
    'Null'

    > getClass({})
    'Object'

    > getClass([])
    'Array'

    > getClass(JSON)
    'JSON'

    > (function () { return getClass(arguments) }())
    'Arguments'

    > function Foo() {}
    > getClass(new Foo())
    'Object'

typeof

The typeof categorizes primitives and allows us to distinguish between primitives and objects.
    typeof value
returns one of the following strings, depending on the operand value:

OperandResult
undefined"undefined"
null"object"
Boolean value"boolean"
Number value"number"
String value"string"
Function"function"
All other values"object"

typeof returning "object" for null is a bug. It can’t be fixed, because that would break existing code. Note that a function is also an object, but typeof makes a distinction. Arrays, on the other hand, are considered objects by it.

instanceof

instanceof checks whether a value is an instance of a type:
    value instanceof Type
The operator looks at Type.prototype and checks whether it is in the prototype chain of value. That is, if we were to implement instanceof ourselves, it would look like this (minus some error checks, such as for type being null):
    function myInstanceof(value, Type) {
        return Type.prototype.isPrototypeOf(value);
    }
instanceof always returns false for primitive values:
    > "" instanceof String
    false
    > "" instanceof Object
    false

Array.isArray()

Array.isArray() exists because of one particular problem in browsers: each frame has its own global environment. An example: Given a frame A and a frame B (where either one can be the document). Code in frame A can pass a value to code in frame B. Then B code cannot use instanceof Array to check whether the value is an array, because its B Array is different from the A Array (of which the value could be an instance). An example:
    <html>
    <head>
        <script>
            // test() is called from the iframe
            function test(arr) {
                var iframeWin = frames[0];
                console.log(arr instanceof Array); // false
                console.log(arr instanceof iframeWin.Array); // true
                console.log(Array.isArray(arr)); // true
            }
        </script>
    </head>
    <body>
        <iframe></iframe>
        <script>
            // Fill the iframe
            var iframeWin = frames[0];
            iframeWin.document.write(
                '<script>window.parent.test([])</'+'script>');
        </script>
    </body>
    </html>
Therefore, ECMAScript 5 introduced Array.isArray() which uses [[Class]] to determine whether a value is an array. The intention was to make JSON.stringify() safe. But the same problem exists for all types when used in conjunction with instanceof.

Built-in prototype objects

The prototype objects of built-in types are strange values: they are primal members of the type, but not instances of it. That leads to categorization being quirky. By examining the quirkiness, we can deepen our understanding of categorization.

Object.prototype

Object.prototype is similar to an empty object: It is printed as one and does not have any enumerable own properties (its methods are all non-enumerable).
    > Object.prototype
    {}
    > Object.keys(Object.prototype)
    []
Unexpected. Object.prototype is an object, but it is not an instance of Object. On one hand, both typeof and [[Class]] recognize it as an object:
    > getClass(Object.prototype)
    'Object'
    > typeof Object.prototype
    'object'
On the other hand, instanceof does not consider it an instance of Object:
    > Object.prototype instanceof Object
    false
In order for the above result to be true, Object.prototype would have to be in its own prototype chain. But that would cause a cycle in the chain, which is why Object.prototype does not have a prototype. It is the only built-in object that doesn’t have one.
    > Object.getPrototypeOf(Object.prototype)
    null
This kind of paradox holds for all built-in prototype objects: They are considered instances of their type by all mechanisms except instanceof.

Expected. [[Class]], typeof and instanceof agree on most other objects:

    > getClass({})
    'Object'
    > typeof {}
    'object'
    > {} instanceof Object
    true

Function.prototype

Function.prototype is itself a function. It accepts any arguments and returns undefined:
    > Function.prototype("a", "b", 1, 2)
    undefined
Unexpected. Function.prototype is a function, but not an instance of Function: On one hand, typeof, which checks whether an internal [[Call]] method is present, says that Function.prototype is a function:
    > typeof Function.prototype
    'function'
The [[Class]] property says the same:
    > getClass(Function.prototype)
    'Function'
On the other hand, instanceof says that Function.prototype is not an instance of Function.
    > Function.prototype instanceof Function
    false
That’s because it doesn’t have Function.prototype in its prototype chain. Instead, its prototype is Object.prototype:
    > Object.getPrototypeOf(Function.prototype) === Object.prototype
    true
Expected. With other functions, there are no surprises:
    > typeof function () {}
    'function'
    > getClass(function () {})
    'Function'
    > function () {} instanceof Function
    true
Function is also a function in every sense:
    > typeof Function
    'function'
    > getClass(Function)
    'Function'
    > Function instanceof Function
    true

Array.prototype

Array.prototype is an empty array: It is displayed that way and has a length of 0.
    > Array.prototype
    []
    > Array.prototype.length
    0
[[Class]] also considers it an array:
    > getClass(Array.prototype)
    'Array'
So does Array.isArray(), which is based on [[Class]]:
    > Array.isArray(Array.prototype)
    true
Naturally, instanceof doesn’t:
    > Array.prototype instanceof Array
    false
We won’t mention prototype objects not being instances of their type for the remainder of this section.

RegExp.prototype

RegExp.prototype is a regular expression that matches everything:
    > RegExp.prototype.test("abc")
    true
    > RegExp.prototype.test("")
    true
RegExp.prototype is also accepted by String.prototype.match, which checks whether its argument is a regular expression via [[Class]]. And that check is positive for both regular expressions and the prototype object:
    > getClass(/abc/)
    'RegExp'
    > getClass(RegExp.prototype)
    'RegExp'
Excursion: the empty regular expression. RegExp.prototype is equivalent to the “empty regular expression”. That expression is created in either one of two ways:
    new RegExp("")  // constructor
    /(?:)/          // literal
You should only use the RegExp constructor if you are dynamically assembling a regular expression. Alas, expressing the empty regular expression via a literal is complicated by the fact that you can’t use //, which would start a comment. The empty non-capturing group (?:) behaves the same as the empty regular expression: It matches everything and does not create captures in a match.
    > new RegExp("").exec("abc")
    [ '', index: 0, input: 'abc' ]
    > /(?:)/.exec("abc")
    [ '', index: 0, input: 'abc' ]
Compare: an empty group not only holds the complete match at index 0, but also the capture of that (first) group at index 1:
    > /()/.exec("abc")
    [ '',  // index 0
      '',  // index 1
      index: 0,
      input: 'abc' ]
Interestingly, both an empty regular expression created via the constructor and RegExp.prototype are displayed as the empty literal:
    > new RegExp("")
    /(?:)/
    > RegExp.prototype
    /(?:)/

Date.prototype

Date.prototype is also a date:
    > getClass(new Date())
    'Date'
    > getClass(Date.prototype)
    'Date'
Dates wrap numbers. Quoting the ECMAScript 5.1 specification:
A Date object contains a Number indicating a particular instant in time to within a millisecond. Such a Number is called a time value. A time value may also be NaN, indicating that the Date object does not represent a specific instant of time.

Time is measured in ECMAScript in milliseconds since 01 January, 1970 UTC.

Two common ways of accessing the time value is by calling valueOf or by coercing a date to number:
    > var d = new Date(); // now

    > d.valueOf()
    1347035199049
    > Number(d)
    1347035199049
The time value of Date.prototype is NaN:
    > Date.prototype.valueOf()
    NaN
    > Number(Date.prototype)
    NaN
Date.prototype is displayed as an invalid date, the same as dates that have been created via NaN:
    > Date.prototype
    Invalid Date
    > new Date(NaN)
    Invalid Date

Number.prototype

Number.prototype is roughly the same as new Number(0):
    > Number.prototype.valueOf()
    0
The conversion to number returns the wrapped primitive value:
    > +Number.prototype
    0
Compare:
    > +new Number(0)
    0

String.prototype

String.prototype is roughly the same as new String(""):
    > String.prototype.valueOf()
    ''
The conversion to string returns the wrapped primitive value:
    > "" + String.prototype
    ''
Compare:
    > "" + new String("")
    ''

Boolean.prototype

Boolean.prototype is roughly the same as new Boolean(false):
    > Boolean.prototype.valueOf()
    false
Boolean objects can be coerced to boolean (primitive) values, but the result of that coercion is always true, because converting any object to boolean is always true.
    > !!Boolean.prototype
    true
    > !!new Boolean(false)
    true
    > !!new Boolean(true)
    true
That is different from how objects are converted to numbers or strings: If an object wraps these primitives, the result of a conversion is the wrapped primitive.

Recommendations

This section gives recommendations for how to best categorize values in JavaScript.

Treating prototype objects as primal members of their types

Is a prototype object always a primal member of a type? No, that only holds for the built-in types. In general, that behavior of prototype objects is merely a curiosity; it is is better to think of them as analogs to classes: they contain properties that are shared by all instances (usually methods).

Which categorization mechanisms to use

When deciding on how to best use JavaScript’s quirky categorization mechanisms, you have to distinguish between normal code and code that might encounter values from other frames.

Normal code. For normal code, use typeof and instanceof and forget about [[Class]] and Array.isArray(). You have to be aware of typeof’s quirks: That null is considered an "object" and that there are two non-primitive categories: "object" and "function". For example, a function for determining whether a value v is an object would be implemented as follows.

    function isObject(v) {
        return (typeof v === "object" && v !== null)
            || typeof v === "function";
    }
Trying it out:
    > isObject({})
    true
    > isObject([])
    true

    > isObject("")
    false
    > isObject(undefined)
    false

Code that works with values from other frames. If you expect to receive values from other frames then instanceof is not reliable, any more. You have to consider [[Class]] and Array.isArray(). An alternative is to work with the name of an object’s constructor but that is a brittle solution: not all objects record their constructor, not all constructors have a name and there is the risk of name clashes. The following function shows how to retrieve the name of the constructor of an object.

    function getConstructorName(obj) {
        if (obj.constructor && obj.constructor.name) {
            return obj.constructor.name;
        } else {
            return "";
        }
    }
Another thing worth pointing out is that the name property of functions (such as obj.constructor) is non-standard and, for example, not supported by Internet Explorer. Trying it out:
    > getConstructorName({})
    'Object'
    > getConstructorName([])
    'Array'
    > getConstructorName(/abc/)
    'RegExp'

    > function Foo() {}
    > getConstructorName(new Foo())
    'Foo'
If you apply getConstructorName() to a primitive value, you get the name of the associated wrapper type:
    > getConstructorName("")
    'String'
That’s because the primitive value gets the property constructor from the wrapper type:
    > "".constructor === String.prototype.constructor
    true

What to read next

In this article, you learned how to categorize values in JavaScript. It is unfortunate that one needs detailed knowledge in order to perform this task properly, as the two primary categorization operators are flawed: typeof has quirks (such as returning "object" for null) and instanceof cannot handle objects from other frames. The article included recommendations for working around those flaws.

As a next step, you can learn more about JavaScript inheritance. The following four blog posts will get you started: