2011-09-20

Quasi-literals: embedded DSLs in ECMAScript.next

Update 2012-08-04: The new name for quasi-literals is “template strings”. This and other recent changes are covered by Sect 4.

Quasi-literals [1] are a syntactic construct that facilitates the implementation of embedded domain-specific languages (DSLs) in JavaScript. They are currently slated for inclusion in the next version of ECMAScript [2]. This post explains how quasi-literals work.

Introduction

The idea is as follows: A quasi-literal (short: a quasi) is similar to a string literal and a regular expression literal in that it provides a simple syntax for creating data. The following is an example.
    quasiHandler`Hello ${firstName} ${lastName}!`
This is just a compact way of writing (roughly) the following function call:
    quasiHandler(["Hello ", " ", "!"], firstName, lastName)
Thus, the name before the content in backquotes is the name of a function to call, the quasi handler. The handler receives two different kinds of data:
  • Literal sections such as "Hello ".
  • Substitutions such as firstName (delimited by a dollar sign and braces). A substitution can be any expression.
Literal sections are known statically, substitutions are only known at runtime.

Examples

Quasis are quite versatile, because a quasi-literal becomes a function call and because the text that that function receives is structured. Therefore, you only need to write a new function to implement a new domain-specific language. The following examples are taken from [1] (which you can consult for details):

Raw strings

Raw strings are string literals with multiple lines of text and no interpretation of escaped characters.
    var str = raw`This is a text
    with multiple lines.
    Escapes are not interpreted,
    \n is not a newline.`;

Parameterized regular expression literals

There are two ways of creating regular expression instances.
  • Statically, via a regular expression literal.
  • Dynamically, via the RegExp constructor.
If you use the latter way, it is because you have to wait until runtime so that all necessary ingredients are available: You are usually concatenating regular expression fragments and text that is to be matched verbatim. The latter has to be escaped properly (dots, square brackets, etc.). By defining a regular expression handler re, we can help with this task:
    re`\d+(${localeSpecificDecimalPoint}\d+)?`

Query languages

Example:
    $`a.${className}[href=~'//${domain}/']`
This is a DOM query that looks for all <a> tags whose CSS class is className and whose target is a URL with the given domain. The quasi handler $ ensures that the arguments are correctly escaped, making this approach safer than manual string concatenation.

Text localization (L10N)

There are two components to L10N. First the language and second the locale (how to format numbers, time, etc.). Given the following message.
    alert(msg`Welcome to ${siteName}, you are visitor
              number ${visitorNumber}:d!`);
The handler msg would work as follows.
  • The literal parts are concatenated to form a string that can be used to look up a translation in a table. An example for a lookup string is:
        "Welcome to {0}, you are visitor number {1}!"
    
    An example for a translation to German is:
        "Besucher Nr. {1}, willkommen bei {0}!"
    
    The English “translation” would be the same as the lookup string.
  • Next, the result from the lookup is used to display the substitutions. Because a lookup result includes indices, it can rearrange the order of the substitutions. That has been done in German, where the visitor number comes before the site name. How the substitutions are formatted can be influenced via annotations such as :d. This annotation means that a locale-specific decimal separator should be used for visitorNumber. Thus, a possible English result is:
        Welcome to ACME Corp., you are visitor number 1,300!
    
    In German, we have results such as:
        Besucher Nr. 1.300, willkommen bei ACME Corp.!
    

Secure content generation

With quasis, one can make a distinction between trusted content coming from the program and untrusted content coming from a user. For example:
    safehtml`<a href="${url}">${text}</a>`
The literal sections come from the program, the substitutions url and text come from a user. The quasi handler safehtml can ensure that no malicious cade is injected via the substitutions. For HTML, the ability to nest quasis is useful:
    rows = [['Unicorns', 'Sunbeams', 'Puppies'], ['<3', '<3', '<3']],
    safehtml`<table>${
        rows.map(function(row) {
            return safehtml`<tr>${
                row.map(function(cell) {
                    return safehtml`<td>${cell}</td>`
                })
            }</tr>`
        })
    }</table>`
Explanation: The rows of the table are produced by an expression – the invocation of the method row.map(). The result of that invocation is an array of strings that are produced by recursively invoking a quasi. safehtml concatenates those strings and inserts them into the given frame. The cells for each row are produced in the same manner.

Templates

Templates are very similar to quasis, in that they are text with holes in them. But one normally uses objects (e.g. JSON data) to fill in the holes. For example, the following is a template:
    <h1>${{title}}</h1>
    ${{content}}
Using a quasi instead of a string literal to define this text has two advantages: Quasis do the parsing for you and a quasi can comprise multiple lines. A template would be defined as follows:
    var myTmpl = tmpl`
    <h1>${{title}}</h1>
    ${{content}}
    `;
This works, because {title} and {content} are actual ECMAScript.next expressions: {foo,bar} is syntactic sugar for {foo: foo, bar: bar}. Thus, the handler will receive a value such as { title: undefined } for the first substitution. With templates, the handler is not interested in the value of title, just in its name and this trick lets it access it. A disadvantage of using a quasi in this manner is that variables such as title and content have to exist (but they don’t have to have a value). Therefore, the above must be written as
    var title, content;
    var myTmpl = tmpl`
    ...

Implementing a handler

The following is a quasi-literal:
    handlerName`lit1\n${subst1} lit2 ${subst2}`
This is transformed internally to a function call (adapted from [1]):
    // Hoisted: call site ID
    // “cooked”, newline interpreted
    const callSiteId1234 = ['lit1\n', ' lit2 ', ''];
    // “raw”, newline verbatim
    callSiteId1234.raw = ['lit1\\n', ' lit2 ', ''];

    // In-situ: handler invocation
    handlerName(callSiteId1234, subst1, subst2)
The parameters of the handler are split into two categories:
  1. The callSiteID where you get the literal parts both with escapes such as \n interpreted (“cooked”) and uninterpreted (“raw”). The number of literal parts is always one plus the number of substitutions. If a substitution is first in a literal, it is prefixed by an empty literal part. If substitution is last, it is suffixed by an empty literal part (as in the example above).
  2. The substitutions, whose values become trailing parameters.
The idea is that the same literal might be executed multiple times (e.g. in a loop); with the callSiteID, the handler can cache data from previous invocations. (1) is potentially cacheable data, (2) changes with each invocation.

Assigning to substitutions

An extended version of quasis (that probably won’t be part of ECMAScript.next) allows one to assign to substitutions. For example:
    if (re_match`before (${=x}\d+) after`(myString)) {
        // Do something with x
    }
re_match creates a function which is immediately invoked on myString. That function returns true if myString is a match and assigns the first matching group to the variable x at the same time. Compare the above to the equivalent quasi-less JavaScript code below. Note that you need an extra variable to hold the match.
    var match = /before (\d+) after/.exec(myString);
    if (match) {
        x = match[1];
        // Do something with x
    }
To make a substitution assignable, the follow translation happens: Each writable substitution ${=x} is passed to the handler as the following function (other writable substitutions such as ${=obj.prop} work the same).
    function () { return arguments.length ? (x = arguments[0]) : x }
Explanation: If you call this function with no arguments, you get the value of the substitution. If you provide an argument, it is assigned to the substitution.

Each read-only substitution ${x} is passed to the handler as a function.

    function() { return x }

Recent changes

In July 2012, several changes have been made to quasi literals:
  1. They are not called “quasi literals”, any more. The name “template strings” has been proposed and a few other suggestions are also being debated.
  2. Braces must always follow after a dollar sign ($), you can’t omit them around an identifier, any more.
  3. The call site ID is now an array with the cooked literal parts. The raw literal parts are stored in a property raw of that array. The rationale is that the cooked parts will be used much more often.
The post has been updated to reflect changes (2) and (3).

Conclusion

As you can see, there are many applications for quasi-literals. You might wonder why ECMAScript.next does not introduce a full-blown macro system. That is because it is quite difficult to create a macro system for a language whose syntax is as complex as JavaScript’s. This task will thus take more time and, possibly, research. There is hope, though: With much luck, we will see macros in ECMAScript 8 [3].

Acknowledgement. Thanks to Brendan Eich, Mark S. Miller, Mike Samuel, and Allen Wirfs-Brock for answering my quasis-related questions on the es-discuss mailing list.

References

  1. ECMAScript Quasi-Literals [proposal for ECMAScript.next]
  2. ECMAScript.next: the “TXJS” update by Eich
  3. A first look at what might be in ECMAScript 7 and 8

No comments: