2013-02-16

asm.js: closing the gap between JavaScript and native

Update 2013-12-30:

asm.js defines a subset of JavaScript that can be compiled to fast executables. It has been created at Mozilla by David Herman, Luke Wagner and Alon Zakai. According to the specification, “[asm.js] effectively describes a safe virtual machine for memory-unsafe languages like C or C++.” This blog post describes how asm.js works, it is based on the specification.

Things that slow down JavaScript

Current JavaScript is already quite fast, but a few mechanisms in engines limit its speed:
  • Boxing: Floating point numbers (including integers stored as floating point numbers) are boxed, they have wrappers that allow them to co-exist with other values such as objects.
  • Just-in-time (JIT) compilation and runtime type checks: Most JavaScript engines compile code in two stages. Initially, a format is used that can be compiled to quickly, but that runs slowly (e.g. interpreted bytecode). The execution of that format is observed. If it runs more often, assumptions can be made about the types of its parameters etc. and it can be compiled to a format that runs faster. If one of the assumptions turns out to be wrong, the faster format can’t be used any more and the engine has to go back to the slower format. The faster format is always slowed down by checking whether the assumptions still hold.
  • Automated garbage collection: which can be slow.
  • Flexible memory layout: JavaScript’s data structures are very flexible, but they also make memory management slower.
asm.js code can produce executables that exhibit none of these drawbacks. They can be compiled “ahead of time” (before execution) and are faster than JIT-compiled ones:
“asm.js can be implemented massively faster than anything existing JavaScript engines can do, and it’s closing the gap to native more than ever.”
David Herman
The asm.js specification only describes what JavaScript code has to look like to be asm.js-compliant, the semantics follow from the ECMAScript language specification. That is, asm.js is a true subset of JavaScript.

How asm.js works

asm.js code is packaged in specially marked functions (“asm.js modules”) that have the following structure:
    function MyAsmModule(stdlib, foreign, heap) {
        "use asm";  // marks this function as an asm.js module

        // module body:

        function f1(...) { ... }
        function f2(...) { ... }
        ...

        return {
            export1: f1,
            export2: f2,
            ...
        };
    }
Two steps are performed before asm.js code can be used:
  • Ahead of time (AOT) compilation: a complete fast executable can be produced when the code is loaded (compare: JITs only produce a slow version at load time).
  • Linking: The asm.js module function is invoked and linked to its external dependencies stdlib and foreign (see below).
All three parameters are optional (if any of them are missing, appropriate default values are created):
  • stdlib: a standard library object, providing access to (a subset of) the standard library.
  • foreign: a foreign function interface (FFI) providing access to arbitrary external JavaScript functions.
  • heap: a heap buffer, an instance of ArrayBuffer that acts as the asm.js heap.
The functions in the returned object can be invoked from non-asm.js code like all other JavaScript functions.

Example. Let’s look at a concrete example:

    function DiagModule(stdlib) {
        "use asm";

        var sqrt = stdlib.Math.sqrt;

        function square(x) {
            x = +x;
            return +(x*x);
        }

        function diag(x, y) {
            x = +x;
            y = +y;
            return +sqrt(square(x) + square(y));
        }

        return { diag: diag };
    }
The following code links and uses this asm.js module.
    // Browsers: this === window
    var fast = DiagModule(this);   // link the module
    console.log(fast.diag(3, 4));  // 5

Standard library

asm.js has very limited access to JavaScript’s standard library. Only the following values can be accessed:
  • Global double values: Infinity, NaN
  • Double functions (arity 1): Math.acos, Math.asin, Math.atan, Math.cos, Math.sin, Math.tan, Math.ceil, Math.floor, Math.exp, Math.log, Math.sqrt
  • Double functions (arity 2): Math.atan2, Math.pow
  • Integer or double function (arity 1): Math.abs
  • Integer function (arity 2, proposed for ECMAScript 6): Math.imul (integer multiplication)
  • Double values: Math.E, Math.LN10, Math.LN2, Math.LOG2E, Math.LOG10E, Math.PI, Math.SQRT1_2, Math.SQRT2

Static typing

asm.js code is statically typed. You statically specify the type of a variable declaration via its initializer. For example:
    var a = 0;    // a has type int
    var b = 0.0;  // b has type double
You statically specify the types of parameters and return values via type annotations. For example:
    function foo(x, y) {
        var x = x|0;   // x has type int
        var y = +y;    // y has type double
        return +(x * y);   // function returns a double
    }
The annotations tell the compiler what type to expect and also coerce arguments to the correct type.

Supported types

Value types. asm.js supports:
  • 64 bit double-precision floating point numbers. Type annotation: +x
  • 32 bit integers (ignoring several types that are needed so that all asm.js-supported JavaScript operations can be typed correctly). Type annotation: x|0
  • 32 bit floats. Type annotation: Math.fround(x)
asm.js’s doubles are the same as JavaScript’s. You can’t work with integers directly in JavaScript, but 32 bit integers are used internally [1].

References types. Reference types are only allowed for variable declarations at the top level of a module. All other variables and parameters must have value types. The following reference types are available:

  • ArrayBufferView types: Int8Array, Uint8Array, Int16Array, Uint16Array, Int32Array, Uint32Array, Float32Array, Float64Array. These types are used for accessing the asm.js heap.
  • Functions
  • Function tables: an array of functions that all have the same type
  • References to foreign functions

asm.js-specific features in ECMAScript 6. Two ECMAScript 6 features were added specifically to better support asm.js and similar approaches.

Checking for asm.js conformance

Static checks (at compile time). The code of an asm.js module is statically checked (when the JavaScript code is loaded): It must only use the declarations, statements and expressions that are part of asm.js (e.g. almost none of the OOP features of JavaScript are). And it must be well-typed, according to the static type system. If the checks fail, then asm.js code can’t be AOT-compiled and is executed as normal JavaScript code.

Dynamic checks (at link time). When you invoke an asm.js module, the following dynamic checks are performed. If one of them fails, the AOT-compiled code can’t be linked and the engine must fall back to normal JavaScript.

  • No exception must be thrown until the return statement is reached.
  • The heap object (if provided) must be an instance of ArrayBuffer. Its byteLength must be a multiple of 8.
  • All view objects must be true instances of their respective typed array types.
  • All properties of the stdlib object must implement the semantics as specified by the ECMAScript standard. In practice, that means that they must have the same values as the properties of the global object that have the same names.

Advantages of asm.js

The approach taken by asm.js has the following benefits:
  • Relatively easy to implement on top of existing JavaScript engines. Quoting David Herman:
    [...] it’s significantly easier to implement in an existing JavaScript engine than from-scratch technologies like NaCl/PNaCl. Luke Wagner has implemented our optimizing asm.js engine entirely by himself in the matter of a few months.
  • Interacts well with JavaScript. It is a subset of JavaScript, after all.
  • Backward compatible with all existing JavaScript engines: if an engine isn’t aware of asm.js then the code simply runs as normal JavaScript.

Emscripten and the performance

To see what is possible with asm.js, you only have to look at Emscripten which is described as follows on its web site:
Emscripten is an LLVM-to-JavaScript compiler. It takes LLVM bitcode (which can be generated from C/C++, using llvm-gcc or clang, or any other language that can be converted into LLVM) and compiles that into JavaScript, which can be run on the web – or anywhere else JavaScript can run.
The list of projects that have been compiled via Emscripten is impressive: SQLite, Graphviz, LaTeX, Python, etc.

Emscripten already produces surprisingly fast code. In fact, its way of code generation has been the inspiration for asm.js and its creator Alon Zakai is part of the asm.js team. A modified version of Emscripten now targets asm.js, which results in considerable performance increases on engines with the necessary support.

Initially, C code compiled to asm.js ran at 50% of native speed. Support for 32 bit floats pushed performance to approximately 70% of native.

The future

Several features are on the horizon for JavaScript that will also benefit asm.js:
  • Modules: ECMAScript 6 will have modules, then asm.js code can be packaged more conveniently.
  • Type guards: Versions after ECMAScript 6 [2] might have type guards, obviating the need for the current, slightly hacky, type annotations.
  • Better parallel programming: better support for data parallelism could come via either ParallelJS [3] or SIMD [4].
  • More value objects are also planned for ECMAScript, with 64 bit integers having priority. Once JavaScript has them, they are also available to asm.js.
Furthermore, asm.js might be expanded to be a more full-fledged virtual machine in the future, similar to the Java Virtual Machine or Microsoft’s Common Language Runtime. Quoting the asm.js FAQ:
Right now, asm.js has no direct access to garbage-collected data; an asm.js program can only interact indirectly with external data via numeric handles. In future versions we intend to introduce garbage collection and structured data based on the [ES7] structured binary data API, which will make asm.js an even better target for managed languages.

Support in JavaScript engines

  • Firefox: optimized for asm.js since version 22
  • Chrome and Opera: are optimizing for asm.js without requiring the directive 'use asm';

Conclusion: the best of both worlds

asm.js is impressive technology. There will probably always be people who would prefer a single standardized bytecode for the web, but asm.js proves that different approaches are possible and beneficial. It gives you the best of both worlds:
  • Low-level code: use asm.js for computationally intensive tasks and as a target language for compilers. The latter is the dominant asm.js use case. That is, it is not meant to be written by hand, but to be generated via tools such as Emscripten (source languages: C, C++) and LLJS (source language: static dialect of JavaScript).
  • High-level code: use all of JavaScript for maximum flexibility.

The new role of JavaScript source code. JavaScript source code becomes a format for delivering programs that abstracts over the different compilation strategies of JavaScript engines [5] and over the difference between asm.js and JavaScript. Delivering source code is not an approach that opposes a particular compilation strategy (bytecode etc.), you simply postpone deciding on one, giving engines the freedom to make their own choice. Additionally, this approach allows you to compile asm.js code in the browser: simply assemble a text string with the code and use eval() or (better) Function() to compile it.

JavaScript engines are optimized for higher-level code in a manner that can’t be replicated by less specialized engines. Thus, there will always be a schism between low-level and high-level engines; asm.js manages to make that schism as small as possible.

More material

References

  1. Integers and shift operators in JavaScript
  2. A first look at what might be in ECMAScript 7 and 8
  3. ParallelJS: data parallelism for JavaScript
  4. JavaScript gains support for SIMD
  5. JavaScript myth: JavaScript needs a standard bytecode

No comments: