2013-02-16

asm.js: closing the gap between JavaScript and native

asm.js defines a subset of JavaScript that can be compiled to fast executables. It has been created at Mozilla by David Herman, Luke Wagner and Alon Zakai. According to the specification, “[asm.js] effectively describes a safe virtual machine for memory-unsafe languages like C or C++.” This blog post describes how asm.js works, it is based on the specification.

Things that slow down JavaScript

Current JavaScript is already quite fast, but a few mechanisms in engines limit its speed:
  • Boxing: Floating point numbers (including integers stored as floating point numbers) are boxed, they have wrappers that allow them to co-exist with other values such as objects.
  • Just-in-time (JIT) compilation and runtime type checks: Most JavaScript engines compile code in two stages. Initially, a format is used that can be compiled to quickly, but that runs slowly (e.g. interpreted bytecode). The execution of that format is observed. If it runs more often, assumptions can be made about the types of its parameters etc. and it can be compiled to a format that runs faster. If one of the assumptions turns out to be wrong, the faster format can’t be used any more and the engine has to go back to the slower format. The faster format is always slowed down by checking whether the assumptions still hold.
  • Automated garbage collection: which can be slow.
  • Flexible memory layout: JavaScript’s data structures are very flexible, but they also make memory management slower.
asm.js code can produce executables that exhibit none of these drawbacks. They can be compiled “ahead of time” (before execution) and are faster than JIT-compiled ones:
“asm.js can be implemented massively faster than anything existing JavaScript engines can do, and it’s closing the gap to native more than ever.”
David Herman
The asm.js specification only describes what JavaScript code has to look like to be asm.js-compliant, the semantics follow from the ECMAScript language specification. That is, asm.js is a true subset of JavaScript.

How asm.js works

asm.js code is packaged in specially marked functions (“asm.js modules”) that have the following structure:
    function MyAsmModule(stdlib, foreign, heap) {
        "use asm";  // marks this function as an asm.js module

        // module body:

        function f1(...) { ... }
        function f2(...) { ... }
        ...

        return {
            export1: f1,
            export2: f2,
            ...
        };
    }
Two steps are performed before asm.js code can be used:
  • Ahead of time (AOT) compilation: a complete fast executable can be produced when the code is loaded (compare: JITs only produce a slow version at load time).
  • Linking: The asm.js module function is invoked and linked to its external dependencies stdlib and foreign (see below).
All three parameters are optional (if any of them are missing, appropriate default values are created):
  • stdlib: a standard library object, providing access to (a subset of) the standard library.
  • foreign: a foreign function interface (FFI) providing access to arbitrary external JavaScript functions.
  • heap: a heap buffer, an instance of ArrayBuffer that acts as the asm.js heap.
The functions in the returned object can be invoked from non-asm.js code like all other JavaScript functions.

Example. Let’s look at a concrete example:

    function DiagModule(stdlib) {
        "use asm";

        var sqrt = stdlib.Math.sqrt;

        function square(x) {
            x = +x;
            return +(x*x);
        }

        function diag(x, y) {
            x = +x;
            y = +y;
            return +sqrt(square(x) + square(y));
        }

        return { diag: diag };
    }
The following code links and uses this asm.js module.
    // Browsers: this === window
    var fast = DiagModule(this);   // link the module
    console.log(fast.diag(3, 4));  // 5

Standard library

asm.js has very limited access to JavaScript’s standard library. Only the following values can be accessed:
  • Global double values: Infinity, NaN
  • Double functions (arity 1): Math.acos, Math.asin, Math.atan, Math.cos, Math.sin, Math.tan, Math.ceil, Math.floor, Math.exp, Math.log, Math.sqrt
  • Double functions (arity 2): Math.atan2, Math.pow
  • Integer or double function (arity 1): Math.abs
  • Integer function (arity 2, proposed for ECMAScript 6): Math.imul (integer multiplication)
  • Double values: Math.E, Math.LN10, Math.LN2, Math.LOG2E, Math.LOG10E, Math.PI, Math.SQRT1_2, Math.SQRT2

Static typing

asm.js code is statically typed. You statically specify the type of a variable declaration via its initializer. For example:
    var a = 0;    // a has type int
    var b = 0.0;  // b has type double
You statically specify the types of parameters and return values via type annotations. For example:
    function foo(x, y) {
        var x = x|0;   // x has type int
        var y = +y;    // y has type double
        return +(x * y);   // function returns a double
    }
The annotations tell the compiler what type to expect and also coerce arguments to the correct type.

Supported types

Value types. Roughly, asm.js supports 64 bit double-precision floating point numbers and 32 bit integers (ignoring several types that are needed so that all asm.js-supported JavaScript operations can be typed correctly). asm.js’s doubles are the same as JavaScript’s. You can’t work with integers directly in JavaScript, but 32 bit integers (similar to asm.js’s integers) are used internally [1].

References types. Reference types are only allowed for variable declarations at the top level of a module. All other variables and parameters must have value types. The following reference types are available:

  • ArrayBufferView types: Int8Array, Uint8Array, Int16Array, Uint16Array, Int32Array, Uint32Array, Float32Array, Float64Array. These types are used for accessing the asm.js heap.
  • Functions
  • Function tables: an array of functions that all have the same type
  • References to foreign functions

Checking for asm.js conformance

Static checks (at compile time). The code of an asm.js is statically checked (when the JavaScript code is loaded): It must only use the declarations, statements and expressions that are part of asm.js (e.g. almost none of the OOP features of JavaScript are). And it must be well-typed, according to the static type system. If the checks fail, then asm.js code can’t be AOT-compiled and is executed as normal JavaScript code.

Dynamic checks (at link time). When you invoke an asm.js module, the following dynamic checks are performed. If one of them fails, the AOT-compiled code can’t be linked and the engine must fall back to normal JavaScript.

  • No exception must be thrown until the return statement is reached.
  • The heap object (if provided) must be an instance of ArrayBuffer. Its byteLength must be a multiple of 8.
  • All view objects must be true instances of their respective typed array types.
  • All properties of the stdlib object must implement the semantics as specified by the ECMAScript standard. In practice, that means that they must have the same values as the properties of the global object that have the same names.

Advantages of asm.js

The approach taken by asm.js has the following benefits:
  • Relatively easy to implement on top of existing JavaScript engines. Quoting David Herman:
    [...] it’s significantly easier to implement in an existing JavaScript engine than from-scratch technologies like NaCl/PNaCl. Luke Wagner has implemented our optimizing asm.js engine entirely by himself in the matter of a few months.
  • Interacts well with JavaScript. It is a subset of JavaScript, after all.
  • Backward compatible with all existing JavaScript engines: if an engine isn’t aware of asm.js then the code simply runs as normal JavaScript.

Emscripten

To see what is possible with asm.js, you only have to look at Emscripten which is described as follows on its web site:
Emscripten is an LLVM-to-JavaScript compiler. It takes LLVM bitcode (which can be generated from C/C++, using llvm-gcc or clang, or any other language that can be converted into LLVM) and compiles that into JavaScript, which can be run on the web – or anywhere else JavaScript can run.
The list of projects that have been compiled via Emscripten is impressive: SQLite, Graphviz, LaTeX, Python, etc.

Emscripten already produces surprisingly fast code. In fact, its way of code generation has been the inspiration for asm.js and its creator Alon Zakai is part of the asm.js team. A modified version of Emscripten now targets asm.js, which will result in considerable performance increases on engines with the necessary support. First performance numbers are already in:

OdinMonkey SpiderMonkey V8
skinning 2.46 12.90 59.35
zlib 1.61 5.15 5.95
bullet 1.79 12.31 9.30

OdinMonkey is Firefox’s SpiderMonkey with asm.js support, V8 is Google’s JavaScript engine. Numbers denote how much slower the code is compared to code compiled via gcc -O2 (1.0 would mean “same speed”).

The future

Several features are on the horizon for JavaScript that will also benefit asm.js:
  • Modules: ECMAScript 6 will have modules, then asm.js code can be packaged more conveniently.
  • Type guards: Versions after ECMAScript 6 [2] might have type guards, obviating the need for the current, slightly hacky, type annotations.
  • Better parallel programming: better support for data parallelism could come via either River Trail [3] or SIMD [2].
  • More value objects are also planned for ECMAScript, with 64 bit integers having priority. Once JavaScript has them, they are also available to asm.js.
Furthermore, asm.js might be expanded to be a more full-fledged virtual machine in the future, similar to the Java Virtual Machine or Microsoft’s Common Language Runtime. Quoting the asm.js FAQ:
Right now, asm.js has no direct access to garbage-collected data; an asm.js program can only interact indirectly with external data via numeric handles. In future versions we intend to introduce garbage collection and structured data based on the ES6 structured binary data API, which will make asm.js an even better target for managed languages.

Availability

Quoting Herman:
As the site says, the spec is a work in progress but it's nearly done. Our prototype implementation in Firefox is almost done and will hopefully land in the coming few months.

Conclusion

asm.js is impressive technology. There will probably always be people who would prefer a single standardized bytecode for the web, but asm.js proves that different approaches are possible and beneficial. It gives you the best of both worlds:
  • Low-level code: use asm.js for computationally intensive tasks and/or code that can be compiled to LLVM bitcode.
  • High-level code: use all of JavaScript for maximum flexibility.
JavaScript source code becomes a format for delivering programs that abstracts over the different compilation strategies of JavaScript engines [4] and over the difference between asm.js and JavaScript. Delivering source code is not an approach that opposes a particular compilation strategy (bytecode etc.), you simply postpone deciding on one, giving engines the freedom to make their own choice. Additionally, this approach allows you to compile asm.js code in the browser: simply assemble a text string with the code and use eval() or Function() to compile it.

JavaScript engines are optimized for higher-level code in a manner that can’t be replicated by less specialized engines. Thus, there will always be a schism between low-level and high-level engines; asm.js manages to make that schism as small as possible.

More material

References

  1. Integers and shift operators in JavaScript
  2. A first look at what might be in ECMAScript 7 and 8
  3. JavaScript: parallel programming via River Trail coming to Firefox
  4. JavaScript myth: JavaScript needs a standard bytecode

9 comments:

Luke Wagner said...

Great post! If I could hazard one tiny nit: the AOT compilation happens when the asm.js module (the function containing the "use asm" directive) is *parsed* (we type-check and emit code directly from the parse tree). The "linking" step is what happens when the asm.js module is called with the stdlib/foreign/heap arguments. To wit, a single asm.js module can be linked several times (associating a single piece of code with multiple array buffers).

Axel Rauschmayer said...

Thanks! That makes sense. I misunderstood the first sentence of Sect. 7: “An AOT implementation of asm.js must perform some internal dynamic checks at link time to be able to safely generate AOT-compiled exports.”

“AOT implementation of asm.js” sounded like “AOT compiler”. “Generate” also let me astray, but only because I already had the wrong impression.

The example has a comment that (IMO) is also a bit misleading:
// produces AOT-compiled version

Luke Wagner said...

Thanks for that feedback, we'll try to improve the wording!

Axel Rauschmayer said...

Thanks for your feedback, too. I’ve rewritten the post.

Ralph Haygood said...

I'm curious how big the compiled code tends to be. If it's not too big, it might sometimes be desirable to be able to serve compiled code to browsers.

Luke Wagner said...

Although more measurement is needed, this is a positive indication:

http://mozakai.blogspot.com/2011/11/code-size-when-compiling-to-javascript.html

Axel Rauschmayer said...

Wondering: With C++ (as opposed to C), do think it’s faster to compile it to asm.js code or to object-oriented JavaScript?

Luke Wagner said...

Definitely faster to compile C++ to asm.js; Emscripten already does this, in fact. (From a low-level perspective, C++ is just C with a lot of syntactic sugar and rules.) The biggest limitation, at the moment, is our lack of exception handling support in asm.js. In a future iteration, we are seriously considering adding a limited form of exception handling in asm.js; restrictive enough to allow the same zero-cost exception handling strategy used by C++ compilers [1].

[1] http://llvm.org/docs/ExceptionHandling.html#itanium-abi-zero-cost-exception-handling

ronny d said...

hello axel, I've two question...1) what are the difference respect to lljs, both are developed for mozilla and seems hard than mozilla keep working in 2 really similar projects...


2) asmjs will be simple to mix with javascript code, doing possible combine both and resolve javascript performance issues simply change to asmjs...any know limitation about this??...


thanks!

Web Analytics