NQP gets MoarVM support, cursor reduction, and other news

Originally published on 2013-10-11 by Jonathan Worthington.

I thought it was high time for a round-up of news about the Raku related stuff that I’ve been working on – as usual, with the help of many others. So, here goes!

MoarVM Backend in NQP master

In the last progress update, I mentioned that MoarVM could host NQP and pass most of the NQP language test suite. We got to that point by using a cross-compiler running on Parrot to compile the NQP sources into MoarVM bytecode. Fast forward a month, and we’ve reached the point of having NQP bootstrapped on MoarVM. What’s the difference? Simply, that you can use NQP running on MoarVM in order to build an NQP from source. This means that both bytecode generation and serialization are now working. Since some of the source files we’re compiling are also fairly large (4000 lines), it’s also been a fairly good hardening exercise; many bugs have been fixed.

The MoarVM support has now been added to the NQP repository. Just as you only need a JVM to build NQP for the JVM, you also only need MoarVM to build an NQP for MoarVM. That is, it’s bootstrapped and can stand alone now. NQP monthly releases from here on will thus come with support for three backends: Parrot, JVM and MoarVM.

Better still, NQP on MoarVM now passes the full NQP test suite. It also does so faster than any other backend. This doesn’t mean MoarVM is faster for all things; if you write NQP code with a tight integer loop, or something long running, the JVM’s JIT will kick in and typically come out ahead after the slow start. MoarVM just gets moving a lot faster than the JVM, which is what matters for running lots of short-lived tests.

This means we’re now ready to start work on getting Rakudo running on MoarVM. I don’t have any predictions just yet on precisely when we might land this; the rate of progress in the next couple of weeks, as we dig into it, will provide an indication. Reaching the point of being able to bootstrap NQP on MoarVM is a significant achievement along the way, though. While NQP is both smaller and simpler than full Raku, it still requires being able to execute Raku grammars, eval code, do a whole range of OO-related things (including classes, roles and meta-programming), perform multiple dispatch, handle BEGIN time, and so on. These are all things that a full Raku needs, so it’s very good to have them in place.

Allocating Less Cursors

Allocation profiling work by hoelzro and timotimo (I hope I remembered right; correct me if not!) indicated that both NQP and Rakudo were allocating a huge number of Cursor objects. So what are these? They are objects that keep track of parsing state. They are created as we enter a rule or token in the grammar, thrown away if it fails, or typically placed into a tree of cursors if it passes (though sometimes we only care for pass/fail and so quickly throw it away again). Naturally, we’re going to allocate quite a few of these, but 2.14 million of them being allocated while parsing Rakudo’s CORE.setting seemed decidedly excessive.

I decided to spend some time trying to understand where they all came from. NQP itself tends to be a bit easier to analyze than Rakudo, so I started there, and added some instrumentation to record the number of times each production rule or token was hit and led to allocation of a Cursor. I then used this to gather statistics on a fairly large source file from the NQP build. It started out at 284,742 Cursor allocations.

The first big win from this came when I realized that a huge number of these Cursors were just allocated and returned, to indicate failure. A Cursor is born in a failed state, and many places were just creating them and returning them without any further work, to say “I failed to match”. Thing is, failed Cursor objects all look about the same. The only thing they differ by is type, and for each Cursor type we have a ParseShared instance. Thus, it wasn’t too hard to create a shared “fail Cursor” and just look it up in a bunch of places, rather than allocating. That shaved off over 50,000 allocations. A related realization about improving MARKED and MARKER led to another 30,000 or so allocations chopped. Note that none of this really leads to overall reduced memory usage; all of these objects were quickly collectable. But allocation makes GC run more often, and thus carries some amount of runtime cost.

Further examination of the statistics showed hotspots that seemed unreasonable. The problem wasn’t that they allocated a Cursor needlessly, but rather than we should never have been calling the grammar rule in question so many times. This led to a number of grammar optimizations. In one case, just making sure that a certain rule got a declarative prefix meant the LTM mechanism could very quickly rule it out, saving 15,000 calls to the production rule, and thus cursors. The other big discovery was that the ident built-in was not having an NFA properly generated for it. This is notable because ident features in various paths in the grammar, and thus meant we were failing to quickly rule out a lot of impossible paths through the grammar using the NFA (which is the cheap way to rule things out). With a few other tweaks, all told, we were down to 172,424 Cursor allocations, just 60% of what we started out with.

The statistics for Rakudo’s CORE.setting showed we started out doing 2,196,370 Cursor allocations. Some of the fixes above also aided Rakudo directly, making a good cut into this number. However, analysis of the statistics revealed there were more wins to be had. So far, we managed to bring it down to 1,231,085 – a huge reduction. There’s likely a lot more wins to be had here, but I think we’ve picked all of the large, low-hanging fruit by now.

Rakudo Debugger on JVM

I spent some time getting the Rakudo Debugger to also work on the JVM. I’ve still got some work to do on making it easily buildable, and so it can be installed with Panda on the JVM like it can on Parrot. But the rest of the work is done, and I was able to step through programs, set breakpoints, examine variables and so forth. So, that’s another of the “things missing on the JVM port” mostly gone.

Rakudo on JVM spectest

I also did a little work to fix more of the spectests that pass on Rakudo Parrot, but fail on Rakudo JVM. The result: 99.9% of the spectests that pass on Rakudo Parrot also pass on Rakudo JVM. To put an exact number on it, that’s 27 tests different. Around half are Unicode related, which isn’t so surprising since we’ve borrowed Java strings for the time being; in the end, we’ll need to do an NFG implementation on the JVM. The others still need to be hunted down. I think it’s fair to say that you’re rather unlikely to run into this 0.1% in day-to-day usage of Rakudo on the JVM, however. In fact, the thing you’re most likely to miss today is NativeCall – which arnsholt has been working on, and I plan to join in with during the coming days.

All in all, things are coming along very nicely with Rakudo’s JVM support. It was just 11 months ago – not even a year – that I started to build a little bit of infrastructure so we’d be able to create a tree-like data structure in NQP and get JVM bytecode made from it. By now, thanks to the contributions of many besides me, Rakudo runs on the JVM and is very capable there. When it comes to concurrency, it’s the most capable Rakudo you can get. And, once it gets past the slow startup, it’s usually the fastest. It’ll be interesting to see where we stand on these things in another six months or a year’s time. Here’s hoping for another productive time ahead!