ParallelJS: data parallelism for JavaScript

[2013-12-23] jsfuture, dev, javascript, concurrency

Updates:

[2015-01-05] ParallelJS has failed. It apparently wasn’t powerful enough. Its experimental implementation will be removed from SpiderMonkey (Firefox’s JavaScript engine).
[2013-12-28] A low-level JavaScript API for SIMD is another avenue for parallelization (within a single processer core).

JavaScript is still a very sequential language. That is slowly changing. This blog post describes ParallelJS, an effort to bring data parallelism to JavaScript.

Background

Concurrency versus parallelism

Parallelism: more than one task is executed at the same time. That means that the tasks run on different processors or processor cores.
Concurrency: more than one task makes progress at the same time. Often, tasks depend on each other. Concurrent tasks may run in parallel, but they can also be run via time slicing (virtual parallelism, if you will).

These terms are not always used precisely, but the context usually makes it clear what is meant.

Kinds of parallelism

Two models of parallelism are:

Data parallelism: The same piece of code is executed several times in parallel. The instances operate on different elements of the same dataset. For example: MapReduce is a data-parallel programming model.
Task parallelism: Different pieces of code are executed in parallel. Examples: Web Workers and the Unix model of spawning processes.

ParallelJS tackles data parallelism.

ParallelJS (PJS)

ParallelJS started its life as River Trail, a JavaScript language extension for Firefox by Intel. River Trail provides a new constructor, ParallelArray with iteration methods whose callbacks are executed concurrently.

ParallelJS brings the RiverTrail idea to normal arrays and has recently been included in Firefox Nightly. It adds three methods to arrays (binary data will eventually get those methods, too):

Array.prototype.mapPar()
Array.prototype.filterPar()
Array.prototype.reducePar()

These methods work like the traditional array methods map(), filter() and reduce(), but the order in which the elements are processed is undefined (with the non-par versions, you know exactly in which order they are processed). That enables concurrent execution of multiple instances of the callback. However, not all callbacks can be run concurrently. A callback must fulfill certain requirements:

It must not mutate shared data. It can freely change data that it has created, but can only read data from the outside world.
Many host (or “native”) objects can’t be used. That includes the DOM, which may never be supported and regular expressions for which short- to mid-term support is likely. All vanilla JavaScript objects are fine, though.

PJS code is executed like this: It is first run sequentially and observed. The requirements for parallelism are checked. If they are fulfilled, parallel code is produced and run in parallel. If any of the requirements are not fulfilled anymore, then PJS reverts to sequential execution.

The rules for when PJS code can be run concurrently will probably never be standardized, because a PJS standard needs to accomodate a variety of implementation approaches (see below). There will, however, be guidelines and recommendations for what works for most implementations.

Given how difficult it is to predict whether your code can be run concurrently or not, an important part of PJS will be diagnostic tools. They will let you find out when and why code is not parallelizable.

ParallelJS is currently tentatively scheduled for ECMAScript 8. If you are worried that that’s too far into the future: ECMAScript 7 and ECMAScript 8 will be smaller and developed more quickly than ECMAScript 6.

Modes of execution

There are several conceivable ways of executing PJS code:

Sequentially: a shim can make the PJS array methods available on older engines
Concurrently: run multiple instances of the callback concurrently. The operating system can distribute those instances across cores.
SIMD: use SIMD processor instructions (such as Intel’s SSE) for faster processing.
GPU: run vectorized code, but on the GPU instead of the CPU.

The current PJS implementation supports #2. A shim for older engines is available, too. #3 and #4 are being explored but will probably take a while to be implemented.

Task parallelism

Mozilla is also considering added task parallelism to JavaScript; fork-join style, via functions to whom rules apply that are similar to PJS’s (don’t mutate shared state, etc.). The two approaches are very similar, you could probably implement a simple version of PJS via the task parallelism primitives. Therefore, the task parallelism exensions will profit greatly from the work done for PJS. This new kind of task parallelism would be more lightweight than Web Workers and be added to the language proper (as opposed to be part of the web platform).

Sources

This post is based on the following blog posts by Mozilla researcher Nicholas D. Matsakis.

ParallelJS status quo and future: “PJS Roadmap”
What ParallelJS code can be parallelized: “Parallelizable JavaScript Subset”