Kiesel Devlog #1: Now passing 25% of test262!
Published on 2023-09-10.Except for the initial announcement and a couple of fedi posts I haven't talked much about the latest project I'm working on - writing a JS engine from scratch in Zig!
A little over four months and exactly 600 commits later it now passes 25% of test262 (20% only a week ago!), so this seems like a good time to write the first devlog :^)
Humble Beginnings
commit 929bb044ffcd1776c850a5b924f9992915277e58
Author: Linus Groh
Date: Fri Apr 28 19:48:13 2023 +0100
Initial commit
Writing another JS engine seems like a fun project for learning Zig :^)
There was nothing useful in this commit yet, it contains a hello world generated by zig init-exe
. Over the next few weeks I stubbed out some widely used language building blocks (Agent
, Realm
, PropertyDescriptor
, …), the fundamental Value
type, and most importantly the object model.
I spent the last three years or so working on LibJS, SerenityOS's JS engine used in the Ladybird browser — hence another. Already having a solid mental model of the ECMAScript spec meant that I could fully focus on learning Zig as a new language instead having to wrap my head around JS language concepts.
Or, as domi puts it:
the "an average person writes two javascript engines in their life" factoid is false. A statistical person writes 0 javascript engines; Linus Groh, who writes them for fun is an outlier and should not have been counted
And indeed, it is lots of fun :^)
Until I made a basic tokenizer, parser, and bytecode VM, everything was glued together by hand which looked something like this (adapted for a couple of API changes):
pub fn main() !void {
var agent = try Agent.init(gc.allocator(), .{});
defer agent.deinit();
try Realm.initializeHostDefinedRealm(&agent, .{});
const object1 = try builtins.Object.create(&agent, .{
.prototype = null,
});
_ = try object1.internalMethods().defineOwnProperty(
object1,
PropertyKey.from("foo"),
PropertyDescriptor{ .value = Value.from(123) },
);
const object2 = try builtins.Object.create(&agent, .{
.prototype = object1,
});
const value = try object2.internalMethods().get(object2, PropertyKey.from("foo"), Value.from(object2));
std.debug.print("object2.foo = {any}\n", .{value});
}
What's Implemented So Far?
After getting the basics up and running I mostly focused on implementing enough syntax and builtins to run test262, the official ECMAScript conformance test suite. This was planned from the very beginning so I could avoid having to write my own test runner and suite.
Not focusing on running test262 early on is also one of my big regrets from LibJS (as well as not getting objects and realms right from the beginning), so that was to be avoided.
Turns out: you don't need a huge amount of features for test262, which is great! So little in fact that I only implemented for
loops yesterday to get another test harness file working.
Builtins
JS has a lot of built-in functions and keeps getting more, so this will take a while. Sometimes I target missing functions used in the test262 harness causing tests for implemented functionality to fail, but for the most part I randomly pick something that seems fun to work on :^)
Yes, there's a partial implementation of eval()
(before pretty-printing and when the code was still on GitHub). Why do you ask?
The famous demo from the Wat talk also works, and more recently I added proxies:
I also added a couple of non-standard functions (mostly for test262), but they're implemented in the kiesel
utility, not the engine itself:
Kiesel.gc.collect()
Kiesel.createRealm()
Kiesel.evalScript()
Kiesel.print()
(noconsole
object yet)
Expand to see the full list of currently implemented builtins 📝
globalThis
Infinity
NaN
undefined
eval()
isFinite()
isNaN()
Array()
Array.isArray()
Array.of()
Array.prototype.length
Array.prototype.at()
Array.prototype.every()
Array.prototype.find()
Array.prototype.findIndex()
Array.prototype.findLast()
Array.prototype.findLastIndex()
Array.prototype.forEach()
Array.prototype.includes()
Array.prototype.indexOf()
Array.prototype.join()
Array.prototype.lastIndexOf()
Array.prototype.map()
Array.prototype.pop()
Array.prototype.push()
Array.prototype.some()
Array.prototype.toLocaleString()
Array.prototype.toString()
Array.prototype.with()
BigInt()
BigInt.prototype.toLocaleString()
BigInt.prototype.toString()
BigInt.prototype.valueOf()
Boolean()
Boolean.prototype.toString()
Boolean.prototype.valueOf()
Error()
Error.prototype.message
Error.prototype.name
Error.prototype.toString()
NativeError()
(EvalError
,RangeError
,ReferenceError
,SyntaxError
,TypeError
,URIError
)NativeError.prototype.message
NativeError.prototype.name
Function()
Function.prototype.apply()
Function.prototype.call()
Function.prototype.toString()
Math.E
Math.LN10
Math.LN2
Math.LOG10E
Math.LOG2E
Math.PI
Math.SQRT1_2
Math.SQRT2
Math.abs()
Math.ceil()
Math.clz32()
Math.floor()
Math.pow()
Math.random()
Math.round()
Math.sign()
Math.trunc()
Number()
Number.EPSILON
Number.MAX_SAFE_INTEGER
Number.MAX_VALUE
Number.MIN_SAFE_INTEGER
Number.MIN_VALUE
Number.NaN
Number.NEGATIVE_INFINITY
Number.POSITIVE_INFINITY
Number.isFinite()
Number.isInteger()
Number.isNaN()
Number.isSafeInteger()
Number.prototype.toLocaleString()
Number.prototype.toString()
Number.prototype.valueOf()
Object()
Object.assign()
Object.create()
Object.defineProperties()
Object.defineProperty()
Object.entries()
Object.freeze()
Object.getOwnPropertyDescriptor()
Object.getOwnPropertyDescriptors()
Object.getOwnPropertyNames()
Object.getOwnPropertySymbols()
Object.getPrototypeOf()
Object.hasOwn()
Object.is()
Object.isExtensible()
Object.isFrozen()
Object.isSealed()
Object.keys()
Object.preventExtensions()
Object.seal()
Object.setPrototypeOf()
Object.values()
Object.prototype.hasOwnProperty()
Object.prototype.isPrototypeOf()
Object.prototype.propertyIsEnumerable()
Object.prototype.toLocaleString()
Object.prototype.toString()
Object.prototype.valueOf()
Proxy()
Proxy.revocable()
Reflect.apply()
Reflect.construct()
Reflect.defineProperty()
Reflect.deleteProperty()
Reflect.get()
Reflect.getOwnPropertyDescriptor()
Reflect.getPrototypeOf()
Reflect.has()
Reflect.isExtensible()
Reflect.ownKeys()
Reflect.preventExtensions()
Reflect.set()
Reflect.setPrototypeOf()
String()
String.prototype.charAt()
String.prototype.charCodeAt()
String.prototype.toString()
String.prototype.valueOf()
Symbol()
Symbol.asyncIterator
Symbol.for
Symbol.hasInstance
Symbol.isConcatSpreadable
Symbol.iterator
Symbol.keyFor
Symbol.match
Symbol.matchAll
Symbol.prototype
Symbol.replace
Symbol.search
Symbol.species
Symbol.split
Symbol.toPrimitive
Symbol.toStringTag
Symbol.unscopables
Symbol.prototype.toString()
Symbol.prototype.valueOf()
Syntax
I'm using xq's fantastic parser-toolkit
library. Translating the context-free grammar from the ECMAScript spec into that isn't always straightforward, but so far I'm not feeling awful about the parser. It's currently 1.4k lines and somehow hasn't turned into a pile of spaghetti code yet. :^)
Expand to see the full list of currently implemented syntax features 📝
- Literals:
true
/false
/null
(undefined
is not a literal :^))- numbers
- bigints
- strings (no template literals or escapes)
- arrays (including array holes)
- objects (no function property shorthands)
- Statements:
- blocks
var
,for
while
do
/while
if
/else
try
/catch
/finally
throw
return
debugger
- empty
- Functions:
function
declarationsfunction
expressions- arrow functions
- Other expressions:
- identifiers references
- member expressions
- call expressions
new
expressions- update expressions (prefix/suffix
++
/--
) - unary expressions (prefix
delete
,void
,typeof
,+
,-
,~
,!
) - binary expressions (
**
,*
,/
,%
,+
,-
,<<
,>>
,>>>
,&
,^
,|
) - relational expressions (
<
,>
,<=
,>=
,instanceof
,in
) - equality expressions (
==
,!=
,===
,!==
) - logical expressions (
&&
,||
,??
) - conditional expressions (
a ? b : c
) - assignment expressions (
=
,*=
,/=
,%=
,+=
,-=
,<<=
,>>=
,>>>=
,&=
,^=
,|=
,**=
,&&=
,||=
,??=
) - sequence expressions (
,
operator)
Lexical declarations are parsed but use the same bytecode and scoping rules as var
declarations; this was to unbreak tests that relied on simple variable assignment using let
or const
for unrelated functionality.
I prefer working on runtime over syntax/language, mostly because the abstractions in Kiesel for that are much nicer compared to parsing and bytecode generation. So if you're wondering why a certain language feature is missing: that's why.
What's Missing?
Everything else, of course :^)
I consider the entire latest ECMAScript draft to be in scope, only a handful of language features marked Normative Optional, Legacy will be skipped.
There's no roadmap, but two major features needed to unlock a bunch of runtime functionality are promises and iterators, so I'll likely work on those soon. Lesser-used builtins like Weak{Map,Ref,Set}
or Atomics
are low on my priority list and will take longer to appear.
A huge deficiency is the current error reporting, which manifests in two ways:
There are no tracebacks, so finding the exact error source can involve some guesswork and/or printing out the current bytecode offset:
$ cat -p foo.js throw new Error("oh no!") $ kiesel foo.js Uncaught exception: Error: oh no! $
The majority of early errors (syntax errors for certain invalid constructs) are not implemented, and the parser will generally backtrack to the construct it's currently parsing when encountering syntax it doesn't understand (
with
statements in this case):$ cat foo.js if (foo) { // a comment } function bar() { with (baz) {} } $ kiesel foo.js Uncaught exception: SyntaxError: Unexpected token 'function' (foo.js:3:2) $
(there's also something going wrong with the source location but that's another story…)
Thoughts On Zig
As I mentioned at the beginning, the whole point of this was to learn Zig, which I did — I'm by no means an expert but would probably call myself proficient already. It's a joy to use, despite being a 0.x project including compiler bugs and breaking changes (which I was fully aware of from the beginning). They're a relatively small team with large ambitions, and I like that :^)
You can see what it's all about on the website. I did find the standard library lacking on a few occasions, e.g. there seems to be no function for checking if a "list of strings" (slices, []const []const u8
) contains a string ([]const u8
) — and for (haystack) |value| { if (std.mem.eql(u8, value, needle)) break true; } else false
ain't it. I have this in a helper function for now :^)
This also happens to be the first time I learned a language fully out of interest (I need a project for this language I want to learn), not out of necessity (I need to learn a language for this project I want to work on)!
Oh, btw…
…I also want to mention porffor:
a basic wip js aot optimizing wasm compiler in js
It's another new JS engine project, (compiling JS to Wasm, written in JS), and I often talk to CanadaHonk while we're both hacking away on different things or comparing functionality or benchmarks for fun (porffor-compiled code runs on the V8 Wasm engine and thus always wins).
They also built the incredibly useful test262.fyi site, which involved forking and modernizing some existing tooling (esvu, eshost, test262-harness). And of course it features results from Kiesel :^)