The idea is obvious: Why not standardize the bytecode of the virtual machines (VMs) that JavaScript runs on? That would mean that JavaScript programs could be delivered as bytecode and thus would be smaller and start more quickly (after having been loaded). Additionally, it would seem to be easier to port other languages to web browsers, by targeting that bytecode. This post makes its case in two steps: First, it shows that bytecode has several disadvantages. Second, it explains that source code is not as bad a solution as it seems.
The disadvantages of bytecode
Bytecode is a very specialized mechanism:
- There is no single bytecode to “rule all languages”: A good bytecode is intimately tied to the language that is most frequently compiled to it. It is thus impossible to define a bytecode that works well with all languages, especially if you want to support both dynamic and static languages.
- There is no common ground between browsers: The previous rule even applies to the competing implementations of the same language JavaScript. They are too different for a common bytecode to be found; Firefox, Safari and Internet Explorer each use different bytecode, Google’s V8 initially compiles directly to machine code. But wouldn’t it be possible to work towards the goal of a common bytecode or to adopt a single implementation in the long run? Doing so indeed would have some advantages. But having several implementations of the same languages is also useful, because different approaches can be tried. Competition between engines so far has been very good for the JavaScript ecosystem. V8 started a race that so far hasn’t ended and brought tremendous speed gains to JavaScript.
- Bytecode is inflexible: it ties you to the current version of the language and to implementation details such as how data is encoded. Especially with regard to language versions, you need to be flexible on the web where you have many combinations of
language version(s) sent by the server × language versions supported by the browser
Quoting Brendan Eich [1]:
Now, of course, you could say “Let’s version the bytecode”, and then you’re in version hell. The web really doesn’t like to have that kind of versioning. There’s a saying in the WhatWG that versioning is an Anti-Pattern and I agree we should avoid brittle a priori versioning, or heavy-handed versioning. If you look at Flash, it’s gotten into a situation where it has to support versions going back to Flash 4. They have to ship ActionScript 2 as a separate interpreter along with Tamarin. This is the hard row you hoe when you do make detailed choices in a lower-level bytecode, I think, and when you simply have an installed base that can’t be upgraded or doesn’t use a common source language.
Source code is not that bad – it’s meta-bytecode
At first glance, it seems like a suboptimal solution to use source code to deliver programs. At second glance, it has the benefit of flexibility and, with a little work, it can obtain much of the efficiency of bytecode.
- Source code abstracts over different language implementations: JavaScript source code is remarkable in how many closely compatible virtual machines there are for it (browser incompatibilities are another issue!). That is due to several factors: First, with ECMA-262 (“ECMAScript”), JavaScript has a very well written language specification (especially compared to Dart’s which – to be fair – is evolving). Whenever you have a doubt about a language feature, you can turn to ECMA-262 and get a clear answer. Second, JavaScript engine vendors work closely together to evolve the language. Third, there is a test suite called test262 that checks conformance of a JavaScript implementation. Hence, you can consider JavaScript source code to be meta-bytecode – a data format that unifies the different bytecode formats and V8’s machine code.
- Source code abstracts over different language versions: Keeping the delivery format of a new language version backward compatible is easier with source code than it is with bytecode.
- Parsing source code is fast: JavaScript engines have become very efficient at parsing JavaScript source code. Coupled with increased CPU speed, the overhead caused by parsing is becoming less and less important.
- Source can be quite compact: There are two ways of making source code more compact. First, minification – a transformation of source code that maintains the semantics while decreasing the size. For example, minification removes comments and changes variable names to be shorter. Second, compression. After minification, one can apply a compression algorithm such as gzip to achieve further reductions in size.
- Already a good compilation target: JavaScript source code having such a high level of abstraction makes it relatively easy to compile to. Furthermore, being a good compilation target is a consideration in JavaScript’s evolution. Examples of features that are partially motivated by that consideration are: typed arrays (supported by many modern browsers, proposed for a future ECMAScript version) and SIMD (which might be part of ECMAScript 8 [2]). Lastly, JavaScript engines increasingly support this use case. For example, via source maps [3]: If a file A is compiled to a JavaScript file B, then B can be delivered with a source map. Whenever a source code location is reported for B (e.g. in an error message) then it can be traced back to A, via the source map. In the future, source maps will even allow one to debug JavaScript code in the original language.
The remaining bytecode advantage
The main remaining bytecode advantage is that (static and dynamic) analyses can be performed ahead of time and delivered alongside the bytecode. The closest to bytecode one can get without losing the advantages of source code is to use the abstract syntax tree (AST) produced by a parser. The research project
JSZap [4] does just that:
In this paper, we consider reducing the JavaScript source code to a compressed abstract syntax tree (AST) and transmitting the code in this format.
The AST is complemented by the result of several analyses. Such a format could become a standard JavaScript storage format. The advantages of the JSZap approach are:
- Faster parsing and well-formedness checking (including security checks).
- Reduced program size (by approximately 10% compared to minification plus gzip compression).
- Some JavaScript code is currently loaded synchronously via script tags embedded in HTML. With JSZap, the HTML parser can load such code asynchronously whenever the JSZap data indicates that it doesn’t interact with the DOM. The main example are libraries. This is mainly an optimization for older JavaScript applications. Modern applications load all library code asynchronously.
Dart’s snapshots are an extreme kind of ahead-of-time analysis that can probably not be duplicated by a cross-VM format. They improve application startup time. Quoting “
The Essence of Google Dart: Building Applications, Snapshots, Isolates” by Werner Schuster for InfoQ:
... the heap snapshot feature ... is similar to Smalltalk’s image system. An application’s heap is walked and all objects are written to a file. At the moment, the Dart distribution ships with a tool that fires up a Dart VM, loads an application’s code, and just before calling main, it takes a snapshot of the heap. The Dart VM can use such a snapshot file to quickly load an application.
Conclusion
I hope that this post has convinced you that delivering JavaScript programs as source code is not as different as it seems from delivering them as bytecode, especially when size reduction techniques are used. Moreover, while source code takes up more space and loads more slowly, it is also more flexible than bytecode – a trait that is very valuable on the web.
Related reading
- “Bytecode Standard In Browsers” – A Minute With Brendan Eich
- A first look at what might be in ECMAScript 7 and 8
- SourceMap on Firefox: source debugging for languages compiled to JavaScript [update: WebKit, too]
- “JSZap: Compressing JavaScript Code”, by Martin Burtscher, Benjamin Livshits, Gaurav Sinha, Benjamin G. Zorn. Microsoft Research, 2010.
17 comments:
It doesn't seem _that_ hard to tell whether the code interacts with the DOM. Just check the metadata and see if it says yes or no. Then when you run it, if it interacts with the DOM and it said it would not, you thrown an exception.
"V8 started a race"
? As far as I can remeber, Safari and Firefox had their JIT engines months before the announcement of V8.
If there's metadata, why do you need bytecode? You could just use an html attribute on the script tag for that.
My memory is that V8 was the first JavaScript engine that refuted the notion that JavaScript was slow. At it‘s introduction, it was significantly faster than everything else out there. I might be wrong, so if there is evidence to the contrary, please point me to it.
This is just a compendium of poorly reasoned opinions, non sequiturs and plain bullshit. Sorry, but you lose at the internet.
Please be specific: where do you disagree with what I’m writing? Otherwise, you are just trolling.
Nope, the burden of proof is on you. You made the statement, you prove it.
He means allow authors to specify if the script will modify the DOM. Then proceed according to this promise and throw an error if the author lied.
It has absolutely nothing at all to do with bytecode.
I’m assuming we’re all friends here, so asking him whether he might have more information is natural. After some googling, I found the following article: http://waynepan.com/2008/09/02/v8-tracemonkey-squirrelfish-ie8-benchmarks/
I must say that I am disappointed in the tone that some people take in their comments. I'd like to think that this is a tone that would not be taken if these discussions were taken face to face. So please stop the trolling and give courteous feedback, for or against.
* There is no single bytecode to “rule all languages”
JVM, LLVM, and others prove you can have many different languages compile to the same bytecode. Also, DOM, events, and other characteristics of web client-side scripting eliminate incompatible languages from the get-go. Finally, "a few" languages is more than one.
* There is no common ground between browsers
I can't remember what exactly, but I'm sure there's one thing that all browsers have in common (something to do with a parser?). Though I can't help but see your point about different approaches to bytecode. In light of the fact that everybody is forced to implement JavaScript, and anyone else is forced to code in (or generate) - JavaScript [hint].
* Bytecode is inflexible: it ties you to the current version
Say each version of ECMAScript was a version of ECMAVirtualMachine. There would be at most 5 so far that each browser should implement. At least we'd be able to break backwards compatibility. Brendan Eich doesn't want us to version. He wants us to test for the presence of every single feature... I rest my case.
You may like JavaScript. I know about 10 other languages, and I don't. I like prototypes. I like closures. I like evented programming. But I don't like JavaScript, because all other things in it are broken. The client-side scripting monopoly must end. A VM is the only way to go.
(1) The JVM is a good example. It is quite good at running JavaScript (and getting better with invokedynamic). But V8 is even better.
(3) I’m torn when it comes to versioning. It allows you to clean up, but it also resorts in multiple dialects existing on the web and the need to prefix each piece of source code with a version. ECMAScript.next is doing OK without resorting to it. Eventually, there might be a need for it, we’ll see how it goes. It will still be easier without bytecode.
Languages I’ve written substantial amounts of code in: 6502 assembler, Basic, Pascal, Scheme, Prolog, Haskell, Python, C, C++, Java, Perl, Maude. I’m also loosely familiar with Common Lisp, Self and Smalltalk. So I do know that JavaScript is not perfect. But it’s something that all browser vendors agree on. It's also one of the most open programming languages: several open implementations, a good spec, no single company or person dominating it. And being able to directly create objects, change them dynamically and push them to a JSON database (such as MongoDB or CouchDB) with almost no impedance mismatch is really convenient. You have to learn the quirks and to learn to live with them, but in return you get something that is one of the most compelling solutions that *currently* exist. And ECMAScript.next looks promising so far.
“The client-side scripting monopoly must end.”
One of the goals for ES.next is to make JavaScript a better compilation target. Combined with source maps [3], you will have other viable options. CoffeeScript is already quite popular. If a JS engine has bytecode, you are still eventually targeting that bytecode, you are just taking a detour via the source code.
There is no single bytecode to “rule all languages”
Then same go for JavaScript. It is very very very sucks language which missing every performance or constraint feature
So not much language can convert to it perfectly, even Java
There are many kind of programmer in this world and there are many code style and convention
There is no common ground between browsers
No
they have. Currently now there are "HTML" "JavaScript" and "CSS" there
is nothing to stop anyone to make all of these be binary, just not text
Actually
the very very common ground of them is, setting response header to
"text/plain" and every browser must treat \n as new line
Actually you
can just look at binary is just a new JavaScript. just think if
JavaScript is just not C style but being an assembly. Then browser will
try to compete each other to translate those thing with their engine and
that's all
Bytecode is inflexible
And
JavaScript more strict to it only easiest way to do thing. And when you
need just some more thing to control it nicely, it's all about throwing
shit out
You just see, no JavaScript can run as fast as Java or C#,
same go for python, cost for dynamic overhead in the place we don't need
is too much. And just let we can control
But for it "Easiness" it cannot do any real shit we just want, What flexible you find?
Source code is not that bad – it’s meta-bytecode
You are so wrong in many level. Byte code is Meta-SourceCode and that's what it should be
Source can be quite compact:
minification
– The most optimized minification is to compiled it all to byte code
with all number translated to binary and 2 characters can cover 8 digit of number and compression these data is way more compact
Already a good compilation target
Nope. It inflexible for compilation target. for instance You cannot make struct in C# and compiled to it
The remaining bytecode advantage
Is
more than you think. It cover every good side of obfuscated JavaScript
with more performance. And very higher performance if it have added with
performance feature such as struct and memory overlaying to stream
The
most benefit is you can abstract many feature in the bytecode and let
each language adopt that feature in the way it like. When it all
bytecode then you don't need to care about complexity
The remaining JavaScript advantage
I could say it has just debugging it while at runtime. If we obfuscated it then it no different instantly
Conclusion
Look at CLR and CLI, ECMA-335, and see what "bytecode standard"
should look like. Many version is just add feature and backward
compatibility. Something deprecated if it really no needed
Yes, it's
mapping to C# mostly but if any people try to work with other language
then it can "Grow" to anything it's needed. But if people (like you) try
to make an excuse to avoid it then it will stop growing
Standard can grow, if people use it
PS
What all you write is not "JavaScript Myth"
more like "Myth about Bytecode Devil and JavaScript God"
As I understand it, your basic argument is: Why not just use CLR bytecode and be done with it? But that is simply not realistic. Current JavaScript engines differ so widely in their approaches that there would be no common ground for a standardized bytecode. You could – in theory – adopt one that would make other languages run better, but then it would run web apps slower, which is not what you want at the moment. Even Microsoft does not use the CLR to run JavaScript, they have a specialized engine for it.
“Then same go for JavaScript. It is very very very sucks language which missing every performance or constraint feature”
Sorry, but “JavaScript sucks” is too much of a blanked statement.
“So not much language can convert to it perfectly, even Java”
GWT works really well.
- There is no common ground between browsers
“No they have. Currently now there are ‘HTML’ ‘JavaScript’ and ‘CSS’ there is nothing to stop anyone to make all of these be binary, just not text.”
Please read the section again, you might have misunderstood it. That is not what I meant.
“You just see, no JavaScript can run as fast as Java or C#”
Take a look at how V8 works: If you stick to a programming style that is non-dynamic, it can generate very fast code. It even infers classes under the hood!
- Source code is not that bad – it’s meta-bytecode
“You are so wrong in many level. Byte code is Meta-SourceCode and that's what it should be”
Source code is just one more intermediate step. On JavaScript engines that internally use bytecode, you eventually end up with bytecode. At the moment, Java’s bytecode verifier (which is run after loading the bytecode) often takes longer to run than JavaScript’s parser.
- The remaining JavaScript advantage
“I could say it has just debugging it while at runtime. If we obfuscated it then it no different instantly”
I agree!
“What all you write is not ‘JavaScript Myth’
more like ‘Myth about Bytecode Devil and JavaScript God’”
Not it’s about the question: “Does one more intermediate step from source code to byte code matter?” (That intermediate step is parsing the source code in bytecode-based JavaScript engines.)
You accuse me of being a JavaScript bigot, but I’m not [1]. I am not prescribing how things should work in the future. I’m merely describing the current state of affairs. Sticking to source code should make it easy to migrate to other kinds of execution engines, should the need arise in the future.
[1] http://www.2ality.com/2012/09/javascript-glass-half-full.html
Another thing
I see you try to excuse with blaming Java
But Java is not the only one VM in this world
I recommend you to see .NET Framework. It's not perfect, I will not lie to you, but it tried to be designed for multiple language than Java. It even have feature that C# cannot used but it's there to support other language, especially Managed C++
Also there are LLVM which is more general and higher performance than CLR but with the tradeoff of the runtime code generation
JVM is sucks, especially in client, don't grab it to compare with any client side technology. I really hate JVM for many things and the main reason is it try to cut choice out for simplicity like Java. JVM was designed only for Java so it is the worst CLR technology. Really a shame legacy compare to C#
> There is no single bytecode to “rule all languages”
Yes, that's right. But JavaScript is obviously unsuitable (wrong) as a bytecode.
I've tried 30+ compilers listed here (https://github.com/jashkenas/coffee-script/wiki/List-of-languages-that-compile-to-JS) but there are almost no languages which can compile practically to JavaScript. And even "altJS"s such as CoffeeScript and TypeScript are struggling with the flaws of JavaScript (on modules, implicit this etc...).
I can confirm that JavaScript is a scripting language after all, even after evaluating Emscripten and asm.js. We must not stick with JavaScript.
“I've tried 30+ compilers”
Have you documented your endeavors somewhere? Without more details that doesn’t mean much.
“there are almost no languages which can compile practically to JavaScript”
What does that mean? There are many languages where compilation works well (GWT, Dart and ClojureScript come to mind).
“even [...] CoffeeScript and TypeScript are struggling with the flaws of JavaScript”
That is very subjective. Many CoffeeScript and TypeScript programmers are quite productive in these languages. For modules, you have several capable solutions. Etc.
“even after evaluating Emscripten and asm.js”
Please: details. Again: several large C projects have been successfully compiled via Emscripten and first asm.js performance numbers look very good.
Post a Comment