A look at the preparations behind the JVM port, and a progress update

Originally published on 2013-02-01 by Jonathan Worthington.

After my last post giving a status update on the JVM porting of NQP and the compiler toolchain Rakudo builds upon, *hercynium*++ left a comment suggesting that I also blog about the design work behind this effort. I liked the idea, and in this post I’ll attempt to describe it a bit. I can’t possibly capture all of the interesting things in a single post, so if this doesn’t cover aspects that are particularly interesting to anybody reading, let me know and I’ll try and find time to write something on them. :-) ### It started long ago… The first commit to the repository where I’m doing the initial porting work to the JVM may have been back in November, but that isn’t where the journey really started. We’ve known for multiple years now that we would want Rakudo and NQP to target backends besides Parrot. In that time, we’ve had to build a lot of technology in order to be able to build Rakudo at all. Some things we’ve had to build more than once because the first time didn’t produce something satisfactory (where satisfactory means “actually meets our needs”, not “is the most awesome thing ever possible”). Software is, fairly often, as much about learning as it is about building. The more complex the domain you’re working in, there more this applies, and the more likely it is that you’ll have to build one to throw away. By now we’ve thrown away a parser engine, an AST, and about 3 implementations of roles. :-) Of course, there’s the build/buy thing, where buy in open source really means “buy into”, as in use an existing library. We’ve done a bunch of that too, such as libtommath for our big integer support and dyncall for NativeCall support. But the closer something is to the “core domain” – the thing that makes your product distinctive and special – the less able you are to use something off the shelf. Parsing Raku really needs to be done with a Raku grammar, using Longest Token Matching. Its object system really needs something that supports meta-programming, representation polymorphism and gradual typing. Getting BEGIN/eval right and supporting compilation and having the possibility for lexical and anonymous types and packages, which can be dynamically constructed and exported, also left us with something to build (this is the work that led to bounded serialization). Eventual portability has been a design factor in what we’ve built for quite a while. While the only 6model implementation to have become complete enough to support all of Rakudo’s object needs so far is the one running on Parrot, the initial prototypes of 6model were done on the .Net CLR. This was in no small part to make sure that there was a feasible way to implement it on such a VM. Granted, what I actually discovered was a less than awesome way to build it on the CLR (and what I’m doing on the JVM this time around fits in far better with the JVM’s world view). But it was a design consideration from the start. When we updated PAST, the previous AST representation, to QAST (Q is just P++ :-)) then once again portability was a concern; the VM specific bits were all placed under a `QAST::VM` node type. This makes it easy to escape to the underlying VM where needed or where it is most expedient, but it’s both explicit and done in a way that allows specification of what to do on other backends. As part of this work we also build support for the nqp::op abstraction directly into the AST format. The nqp::ops form an opcode set independent of any particular VM. These get mapped as part of turning a QAST tree into code for the target backend (thus meaning there’s no overhead for them in the generated code). They may map directly to the VM’s opcodes, a function or method call in the VM, or do some more complex code generation. The other important piece of the groundwork for portability is that we implemented Rakudo in a mixture of Raku and NQP, and over time have got NQP to the point where it is also written in NQP (and thus can compile itself). This has been a gradual thing; the earliest NQP editions were written directly in PIR, and with time we’ve migrated those bits to NQP – usually at the same point we were doing other improvements already. For example, *pmichaud*++ wrote the latest edition of the regex engine, with LTM support, in NQP. PAST, written in PIR, was replaced by QAST, written in NQP. And 6model’s meta-objects were, from the start, expressed in NQP too. It’s pretty neat that NQP’s definition of things so fundamental as classes is actually written in NQP. It means that we don’t have to port classes and roles, just the primitives they are made out of. ### So digging into the JVM port itself… With all of the above mentioned things in place, it was possible to form a fairly concrete roadmap for porting NQP, then Rakudo, over to the JVM. Being comfortable that the result would enable us to get a fully functional Rakudo on the JVM and an idea of how to get there was important. It’s easy to implement a subset, but if it isn’t factored in a way that lets you do the rest afterwards then you’re in bother and it’ll be time for re-work. My hope was that, after some years of learning about things that don’t work and replacing them with things that do, this time much of the re-work could be avoided. A starting point for this was taking a good look at the JVM’s instruction set, as well as considering what JVMs are typically good at doing. The JVM is a stack machine. This is in contrast to Parrot, which is a register machine. Thankfully, this is mostly a code generation detail rather than being especially deep. As well as the stack, a given method can have local variables (note that everything that contains code on the JVM is called a method, even subroutines, but they call them static methods because it sounds so much more OO :-)). These can hold longer-lived things, so in a sense could be used a bit like Parrot registers. In general, the code generation from QAST uses the stack where possible and falls back to locals where needed. This is because stack usage fits well with what a JVM expects to be doing, and also what its bytecode format expresses most compactly. Locals have an important restriction: they can only be accessed directly in the scope where they are declared. There is no notion of nested methods at the JVM level. This means that locals are not suitable for implementing lexical variables. Thankfully, there is a well established solution: promote such things onto the heap, keeping them in some kind of invocation record. This is what happens with closures in C# on the CLR, for example. There are a bunch of ways to do this transform, with various trade-offs. I’ve done one that was fairly fast to implement, but also enables lookup by array indexing rather than needing a named (hash) lookup in the vast majority of cases. As well as an array index being algorithmically cheaper than a hash lookup, the JVM supports array indexing natively in its opcode set, but not hash lookups. Talking of allocating things on the heap brings us nicely to think about objects. JVMs are very good at fast allocation and collection of objects, because they have to be; there is no stack allocation in Java of anything non-trivial. Of course, that doesn’t mean the VM can’t do escape analysis and stack allocate under the hood. That the VM is really good at object allocation and GC means we don’t need to worry too much about lexicals leading to invocation records on the heap; there’s plenty of performant uses of this approach in the wild. Furthermore, most records will be very short lived, nicely meeting the generational hypothesis (which is that most objects are either short lived or long lived, and so we can optimize separately for each through doing generational garbage collection). While invocation records are relatively internal, of course NQP and Raku involve lots of user-visible objects. From the things you think about as objects (and call “new” on) to things like scalar containers, strings, boxed integers and so forth, both NQP and Raku lead to plenty of allocations. While some things are quite predictably shaped, most come from user class definitions. Ideally, we’d like it if a Raku class definition like: ```` raku class Point { has $!surface; has num $!x; has num $!y; } ```` Was to use memory similarly to if you wrote something in Java like: ```` java class Point { private Object surface; private double x; private double y; } ```` At the same time, we know that the JVM’s idea of type is some way off the Raku notion of type, so we can’t simply turn Raku classes into JVM classes. Thankfully, 6model has from the start been designed around the idea of representation polymorphism. Really, this is just a separation of concerns: we decouple the details of memory representation and access from the notion of being a type and dispatch. The former is handled by a representation, and the latter two by a meta-object. One early but important observation I made when designing 6model is that the representation will always couple closely to the underlying runtime (and thus would need to be implemented for each runtime we wanted to run on), whereas the other bits can be expressed in a higher level way, with the common cases made efficient by caching. Thus there’s no reason to re-implement classes and roles per VM, but there is a need to provide a different, VM-specific way to do P6opaque (the default representation for NQP and Raku objects). The C implementation of P6opaque on Parrot works by calculating a memory layout – essentially, figuring out a struct “on the fly”. What’s the JVM equivalent of that? Well, that’s just creating a JVM class on the fly and loading it. Is the JVM capable of doing that? Sure, it’s easily dynamic enough. Furthermore, once we’ve done that little bit of bytecode generation, it’s a perfectly ordinary JVM class. This means that the JIT compiler knows what to do with it. Does doing any of this require changes to the meta-objects for classes in NQP and Rakudo? No, because these details are all encapsulated in the representation. Things like these are good signs for a design; it tends to show that responsibilities are correctly identified and segregated. ### So, how’s the cross-compiler going? Things are going nicely. Having got much of the way there with the NQP MOP, I turned to `ModuleLoader` and started to get together a basic setting (the setting being the place where built-ins are defined). With those in place, work has moved on to trying to pass the NQP test suite. The build process cross-compiles the MOP, module loader and setting. To run the test suite, each test is taken and cross-compiled against those, then the result of compiling it is run on the JVM. The fact we invoke NQP, then invoke the JVM twice in order to run each test gives quite a bit of fixed overhead per test; once we have NQP itself (that is, the compiler) cross-compiled and self-hosting on the JVM it’ll be down to a single invocation. The NQP test suite for the NQP language itself consists of 65 test files. 3 of them are specific to Parrot, so there’s 62 that are interesting to make run. As of today, we pass 46 of those test files in full. While some of those passing tests exercise relatively simple things (literals, operators, variables, conditionals, loops, closures), others exercise more advanced features (classes, roles, basic MOP functionality, runtime mixins and so forth). Of the 16 test files that remain, 9 of them depend on regexes or grammars. Getting those to run will be the focus of the next major chunk of work: porting the regex compiler and getting the NFA, Cursor and Match classes to cross-compile (which will involve some portability improvements). The other 7 relate to non-trivial, but smaller-than-grammars features (for example, 2 are about multiple dispatch, which I’m working on porting at the moment). It was only three weeks ago when I wrote that the JVM port did not even manage “hello world” yet, and that I had little more to show than something that could turn a range of QAST trees into JVM bytecode. Three weeks later and we’re running around 75% of the NQP test files, and *timotimo*++ was even able to feed an almost unmodified Levenstein distance implementation written in NQP to the cross-compiler and have it run on the JVM. So, mad amounts of coding have taken place? Well, only sorta…I’ve taught two three-day classes for $dayjob in the last couple of weeks also. :-) Mostly, progress has been fast now because the foundations it is building upon have proved fairly solid. For the backend, this is in no small part down to having grown a test suite for the QAST to JVM phase of the work as it progressed. The fact we could happily couple this new backend to the existing NQP parser is thanks to the compiler being structured as a pipeline of stages, each one strongly isolated from the others, just passing a data structure between them. In my teaching work, I often encourage automated testing and talk a lot about the importance of enforcing carefully chosen, well-defined, information-centric boundaries between components. It’s pleasing to see these things paying off well in my Raku work also. :-)