2012-12-11

Using ES6 template strings for regular expressions

ECMAScript 6 template strings [1] give us multi-line raw (backslash has no special meaning) string literals and interpolation. But they can also be tagged, in which case a library can determine the meaning of their content. Steven Levithan has recently given an example of how they could be used for his regular expression library XRegExp.

(As an aside, XRegExp is highly recommended if you are working with regular expressions. You get many advanced features, but there is only a small performance penalty – once at creation time – because XRegExp compiles its input to native regular expressions.)

Without template strings, you write code such as the following:

    var parts = '/2012/10/Page.html'.match(XRegExp(
        '^ # match at start of string only \n' +
        '/ (?<year> [^/]+ ) # capture top dir name as year \n' +
        '/ (?<month> [^/]+ ) # capture subdir name as month \n' +
        '/ (?<title> [^/]+ ) # capture file name without ext as title \n' +
        '\\.html? $ # .htm or .html file ext at end of path ', 'x'
    ));

    console.log(parts.year); // 2012
We can see that XRegExp gives us named groups (year, month, title) and the x flag. With that flag, most whitespace is ignored and comments can be inserted. On the downside, we have to type every regular expression backslash twice, to escape it for the string literal. And it is cumbersome to enter multiple lines: Instead of adding strings, you could also end the line with a backslash. But that is brittle and you still have to explicitly add newlines via \n. These two problems go away with template strings:
    var parts = '/2012/10/Page.html'.match(XRegExp.rx`
        ^ # match at start of string only
        / (?<year> [^/]+ ) # capture top dir name as year
        / (?<month> [^/]+ ) # capture subdir name as month
        / (?<title> [^/]+ ) # capture file name without ext as title
        \.html? $ # .htm or .html file ext at end of path
    `);
Template strings also let you insert values v via ${v}. I’d expect a regular expression library to escape strings and to insert regular expressions verbatim. For example:
    var str   = 'really?';
    var regex = XRegExp.rx`(${str})*`;
This would be equivalent to
    var regex = XRegExp.rx`(really\?)*`;
One could also make the parens around ${str} optional.

Related blog post:

  1. Quasi-literals: embedded DSLs in ECMAScript.next

No comments: