Depending on your background you might also be surprised by a non-linear growth in Dart run times: increasing input string size by a factor of \(2\) (\(25000\) to \(50000\)) increases running time by a factor of \(4\) (\(244\) to \(949\) milliseconds).
Running benchmark with dart --observe and looking at CPU profile in Observatory reveals an unsuprising picture:
Most of the time is spent doing a substring operation, so why Dart VM’s substring is so much slower than V8’s? They must be implemented completely differently - and indeed they are.
Dart VM implements String.substring(start, end) in a very straightforward manner: it allocates a new String object of length end - start + 1 and copies string contents into this new string. Formally speaking this is an \(O(n)\) implementation - also known as linear time - it requires amount of operations proportional to the length of a substring.
If we take a look at the core of our benchmark then it should become obvious why running times exhibit quadratic growth:
If we assuming that initial length of the string s is \(N\) the the first iteration of the loop creates a substring of length \(N - 1\), next iteration creates substring of length \(N - 2\) and so on, until the last iteration creates a substring of length \(1\). This means the loop requires
The \(O(N^2)\) complexity of the above loop is precisely the reason why it’s not a good idea to iterate a string by slicing away processed pieces, unless of course your runtime does “a bit” of magic internally and optimizes for this particular pattern… unsurprisingly V8 does.
If we take a look into V8 sources we discover a somewhat intimidating zoo of different string representations, each optimizing for some particular use case (indexing, concatenation, slicing):
Understanding differences between different string representations used by different JS runtimes is usually a key to understanding why your string manipulation code performs in a certain way. For example we could digress and discuss why
runs 60 times faster in SpiderMonkey than in V8 - but unfortunately that’s beyond the scope of this post.
[With V8 nothing is ever implemented just in a single place, so lost souls searching for substring implementation might also find themselves digging into CodeStubAssembler::SubString which constructs TurboFan graph that would be compiled down to machine code to serve as the SubString stub - which essentially implements fast paths for the C++ logic above.]
Translating the C++ logic to Dart we would get something like this:
Given such a drastic performance improvement why would not we implement a similar optimization in Dart VM? Well, this substring optimization comes with a dangerous trap built in: it leads to surprising memory leaks:
Counterintuitively obj above retains the whole 10Gb input string instead of a small 20 character token - because this token is internally represented as a SlicedString that points back to the source string. This might seem like a contrived example but leaks like this do tend to happen in the real world: here is a three.js issue describing how they had to work around this issue, here is a blog post from somebody who had to hunt for this leak in production. Eagerness of a runtime to fall over itself and make your code faster with clever hidden optimizations has an ugly side too. Issue 2869 tracks progress of fixing this on the V8 side, but nothing has really happened since 2013, probably because the only simple and robust solution is to remove sliced strings altogether. Interestingly that’s precisely what Java did - they used to implement String.substring in \(O(1)\) time by reusing parent String’s char storage for the substring object but that lead to memory leaks and was eventually removed in 2012. V8 history with string slices is even more curious: originally V8 had them, then removed them in 2009, then added it back in 2011.
This all is somewhat bad news for the original Dart SDK bug which prompted this trip down the memory lane. Indeed Dart VM is unlikely to implement substring optimization through sliced-strings. That’s why I decided to probe: why exactly they are measuring substring performance?
Part II: Enter less_dart.
Looking at less_dart benchmark in the Observatory reveals the following picture:
That is Less parser is doing exactly what I would not recommend - iterating through the input string using substring operation. Looking into the source reveals the following code and the reason for using substring becomes obvious:
Authors of Less decided to parse input using regular expressions instead of writing custom lexer, however in pre-ES6 world lexing with RegExp-s was convoluted by the fact that you could not take a regular expression and easily check if it matches at the given position, which is what lexer needs to do as it advances through the string and breaks it into tokens.
[The flag y is apparently named after yylex which is part of the Lex API (not to be confused with Lexx which is a spaceship based on the organic technology which is far beyond even things included into ES7). The name sticky in turn was apparently chosen because it ends with y.]
However what is the way to implement match in a pre-ES6 JS engine? A very naive approach would be to do something like this:
This however is extremely inefficient because calling match(/\d+/g) would essentially search from this.idx forward for the first sequence of digits and then discard the match unless it occurred at this.idx.
A person with a bit more insight into RegExp features might come up with the following optimization:
However a much more common and straightforward way to implement match would be to use substring and anchored regexps:
This is exactly the code we see in the less.js and less_dart - fortunately for less.js V8 runtime implements String.prototype.substring as a \(O(1)\) operation, unfortunately for less_dart Dart VM does not.
What Dart VM does implement is RegExp.matchAsPrefix method - which essentially performs a sticky match of the given regular expression at the given position. Hooray! ☺
However when I did necessary changes to the less_dart code I discovered that it actually became several times slower. Hmm. ☹
It turns out there was a little TODO inside VM’s RegExp implementation:
Obviously the solution was to fix VM by implementing RegExp.matchAsPrefix in the same way V8 implements sticky flag - which is rather simple because Dart VM is using essentially the same regexp engine as one in V8 called Irregexp.
[Maybe we should have called Dart VM port IrIrRegexp because instead of translating RegExp specific intermediate representation (IR) down to the machine code Dart VM translates it further into our machine independent IR and let the generic compiler pipeline take care of it. This allowed us to incorporate Irregexp into Dart VM without porting any machine specific regexp related backend code.]
So I just went and talked to Erik Corry who was one of the minds behind original Irregexp about his implementation of sticky flag. Then I just ported it to Dart VM… and found and fixed a very minor bug in V8’s sticky implementation while doing that.
With RegExp.matchAsPrefix implemented in the efficient way my fixes in the less_dart gave an expected performance improvements brining less_dart and less.js neck to neck with each other.
Part III: The trap of RegExps.
I could have finished my post on this positive note, but I would like to circle back to the common idea of using regexps to tokenize the source code.
A very important thing to realize here is that while regular expressions are certainly a convenient way to lex - they are in no way the most efficient way to do it - there are simply too many abstractions layers involved.
If we take our previous attempts at tokenizing with regexps and benchmark them on a simple string like ("aaaaa aaaa ".repeat(50) + "10 ").repeat(10000) then
we will see the following performance numbers:
Can you go any faster? Lets erase abstraction barriers and go full manual:
Benchmarking this tokenizer with the very same benchmark I used for others:
reveals the following result:
The main thing this demonstrates - is that you program can get much faster if it’s doing less work. For example our manual tokenizer simply glides through the string without allocating any substring objects for tokens. However if we add var val = l.val; inside the benchmark loop to force this meaningless allocation we will still see performance that is much better than our RegExp based parsers:
For the sake of completeness here is the same kind of lexer benchmark in Dart:
Dart VM does respectable job of this code:
So overall performance advice can be summarized as follows:
When using RegExps for lexing use sticky ones in node and use RegExp.matchAsPrefix in Dart VM.
Strongly consider writing your lexer by hand instead of using regular expressions.
The less work your program is doing the faster it is. The less abstraction layers are involved the simpler it is to establish how much work your program is actually doing and decrease the amount.