2012-12-04

Arrays in JavaScript

Update 2012-12-17: New Sect. 3, “The Array constructor”.

This blog post goes deeply into how arrays work in JavaScript: they are much more like objects than you might think.

Overview

Remember that objects in JavaScript are maps from strings to values. Arrays are objects, only a few property keys are special:
  1. Array indices: If a key is a string holding a non-negative integer below a certain maximum then it is treated as an array index.
  2. "length": The value of this property is a non-negative integer holding the length of an array. That length is defined as the numerically largest array index, converted to integer, plus one.
This is somewhat shocking, especially when you are coming from another language: array indices in JavaScript are actually strings. Naturally, engines perform optimizations under the hood so that, internally, that is not true. But it is how the spec defines them and how the programmer sees them. For example:
    > var arr = ['a', 'b', 'c'];
    > arr['0']
    'a'
    > arr[0]
    'a'
Because 0 is not an identifier, the dot notation (arr.0) is a syntax error. Hence, you have to resort to square brackets. The square brackets operator converts its operand to string, which is why arr[0] works, above. Array indices are (roughly) limited to 32 bit. Similarly to the square brackets operator, the in operator also converts its first operand to string. Which is why it can be used with numbers to check whether or not there is an array element at a given index:
    > 2 in [ 'a', 'b', 'c' ]
    true
    > 3 in [ 'a', 'b', 'c' ]
    false
The following sections go into more detail about how arrays work.

Arrays with holes

As we have seen, arrays are maps from numbers (encoded as strings) to values. That means that arrays can have holes, indices smaller than the length that don’t exist as properties in the array. You can create holes in array literals by not writing any value before a comma (a trailing comma has no effect). Holes are skipped by iteration methods such as forEach and map. An array with holes is called sparse. Let’s compare a sparse array and a dense array:
    > var sparse = [ , , 'c' ];
    > var dense  = [ undefined, undefined, 'c' ];

    > 0 in sparse
    false
    > 0 in dense
    true

    > for(var i=0; i<sparse.length; i++) console.log(sparse[i]);
    undefined
    undefined
    c
    > for(var i=0; i<dense.length; i++) console.log(dense[i]);
    undefined
    undefined
    c

    > sparse.forEach(function (x) { console.log(x) });
    c
    > dense.forEach(function (x) { console.log(x) });
    undefined
    undefined
    c

    > sparse.map(function (x,i) { return i });
    [ , , 2 ]
    > dense.map(function (x,i) { return i });
    [ 0, 1, 2 ]

    > sparse.filter(function () { return true })
    [ 'c' ]
    > dense.filter(function () { return true })
    [ undefined, undefined, 'c' ]
For more information on dense and sparse arrays, consult [2].

The Array constructor

There are three ways of invoking the Array constructor:
  1. new Array(): creates a new empty array. The empty array literal [] is a more concise way of doing the same thing.
  2. new Array(len): creates an array with len holes. On some JavaScript engines, this operation lets you pre-allocate arrays, giving you performance benefits for small arrays (not for large ones). In most cases, performance doesn’t matter and you can avoid the redundancy introduced by a preallocation. If it fits the problem, an array literal initializing all elements at once is preferable.
  3. new Array(elem1, elem2, ...): creates an array whose elements are elem1, elem2 etc. This variant of the constructor is never a good choice, because if you use it for a single element that is a number, the variant #2 is invoked. To avoid this bug, use the array literal [ elem1, elem2, ...] which does everything this constructor variant does.
If you call Array as a function (without new) then the effect is the same as calling it as a constructor.

(As an aside, because there are not many use cases for this: JavaScript therefore does not have a built-in function for reliably creating arrays from elements. ECMAScript 6 will have Array.of() which rectifies this situation.)

Array indices

The ECMAScript specification has a very narrow definition for what property keys are considered array indices. A string s is an array index if and only if:
  1. s, parsed as an unsigned 32 bit integer and converted to string is equal to s.
  2. s, converted to integer, is smaller than 232−1 (the maximum length).
Thus, if compared numerically, an array index s must be within the range
0 ≤ s < 232−1
ToUint32, the conversion to an unsigned 32 bit integer, is a spec-internal operation. You can implement it in JavaScript as follows [1]:
    function ToUint32(x) {
        return x >>> 0;
    }
ToUint32 is very tolerant. However, condition (1), above, means that while many strings can be converted to an unsigned 32 bit integer, only a few of them are valid array indices. Examples:
    > ToUint32('0')
    0
    > ToUint32('00')
    0
    > ToUint32('03')
    3
    > ToUint32('abc')
    0
    > ToUint32(Math.pow(2,32)+3)
    3
Only the first example fulfills condition (1) and is a valid array index. All non-array indices are treated like normal properties:
    > var arr = ['a', 'b', 'c'];
    > arr['0']
    'a'
    > arr['00']
    undefined

length

The property length of arrays is always an integer value l in the range
0 ≤ l ≤ 232−1 (32 bit)

Tracking indices

As new elements are added, the length increases automatically:
    > var arr = [];
    > arr.length
    0
    > arr[0] = 'a';
    > arr.length
    1

Decreasing the length

if an existing length l is replaced with a newer value l' that is smaller, then all array elements are removed whose indices i are in the range
l'i < l
For example:
    > var arr = [ 'a', 'b', 'c' ];
    > arr.length
    3
    > 2 in arr
    true
    > arr.length = 2;
    2
    > 2 in arr
    false

Increasing the length

Setting the length to values larger than the current length creates holes:
    > var arr = ['a'];
    > arr.length = 3;
    > arr
    [ 'a', , ,]

Legal values for length

You can assign any value to length, but converting it to a number via ToUint32 and converting it to number must yield the same result. Examples:
    > ToUint32('0')  //*
    0
    > ToUint32('000')  //*
    0
    > ToUint32('-1')
    4294967295
    > ToUint32(Math.pow(2,32)-1)  //*
    4294967295
    > ToUint32(Math.pow(2,32))
    0
    > ToUint32('abc')
    0
Hence, all values marked with an asterisk are legal lengths, the remaining values aren’t:
    > Number('0')
    0
    > Number('000')
    0
    > Number('-1')
    -1
    > Number(Math.pow(2,32)-1)
    4294967295
    > Number(Math.pow(2,32))
    4294967296
    > Number('abc')
    NaN
You can easily test this:
    > [].length = -1
    RangeError: Invalid array length
    > [].length = Math.pow(2,32)
    RangeError: Invalid array length
    > [].length = 'abc'
    RangeError: Invalid array length

Array instances

Instances of Array are like normal objects, except for one difference: If you define a property, two cases are handled differently:
  • Array indices: increase length, if necessary.
  • "length": throw an error for illegal values, possibly remove elements if the new value is smaller than the previous one.
All other property keys are handled the same as for normal objects. Note that property definition is also used by the assignment operator (it calls the internal [[Put]] method which in turn calls [[DefineOwnProperty]] – if there isn’t a setter visible in the prototype chain).

The above means that it is impossible to create a subtype of Array using standard ECMAScript 5. “Subtyping JavaScript built-ins” has details.

Beyond the limits

What happens if you use an array index that is not within the allowed range? Then it is treated like a normal property key. If you use such an index to add to an array, your addition is not considered an array element. For example, let’s use an index that is too big.
    > var arr = ['a', 'b'];
    > arr[Math.pow(2,32)-1] = 'c';
    > arr
    [ 'a', 'b' ]
    > arr.length
    2
    > arr[Math.pow(2,32)-1]
    'c'
You do get an error if you try to grow the length beyond the allowed maximum:
    > var arr = new Array(Math.pow(2,32)-1)  // max length
    > arr.push('x')
    RangeError: Invalid array length

Recommendations

Recommendations when working with arrays:
  • Pretend array indices are numbers. That’s what usually happens under the hood and the general direction in which ECMAScript is moving.
  • Avoid the Array constructor.
  • Use array literals whenever you can.
  • Don’t be clever with regard to arrays. If you follow standard patterns, engines usually figure out what you are trying to do and optimize accordingly. The article “Performance Tips for JavaScript in V8” (by Chris Wilson) contains several interesting array-related suggestions.
If you liked this blog post, you will probably also like “Object properties in JavaScript”.

Related blog posts

  1. Integers and shift operators in JavaScript
  2. JavaScript: sparse arrays vs. dense arrays

No comments: