Samurais had something called bushidō, way of the warrior, code of conduct they had to follow. In similar manner you have to follow certain opt-dō if you want to optimize your application. I have tried to sketch such a path in my nodecamp.eu talk "Understanding V8" and +Daniel Clifford tried to do the same in "V8 Performance Tuning Tricks" talk on GDD11 in Berlin. But not everybody has seen those talks and the question keeps coming back again. So I decided to write down a quick check list for developers who want to optimize their apps. tl;dr version of my checklist is: "Understand before you act".
Understanding V8 and beyond [talks and posts]
I am maintaining a list of V8 related resources. See it here
Do you understand what the application is trying to do and how?
The more you understand about your app the better you can optimize it. Sometimes a tricky algorithm or a cache placed in right place will yield more improvements than any local tweaking. Understanding your application in large is a very difficult problem which requires special tooling and discipline. I highly recommend to read @coda's Metrics Metrics Everywhere talk if you want to get a glimpse of that world. Sometimes it is possible to split big application into pieces and optimize them separately but there is no guarantee that overall gain will not be lost when those pieces are connected back together.
Did you profile your application with built in statistical profiler?
Profiling helps to discover obvious hot spots. Don't waste time rewriting places that occupy 0.0001% of running time. Concentrate your efforts on those that are high on the profile. If you are using V8's tick processors keep in mind that LazyCompile:
prefix does not mean that this time was spent in compiler, it just means that the function itself was compiled lazily. Statistical profiler is not the most accurate tool in the world and might miss overheads that are finely spread across execution (as sampling interval is 2ms). Tools like dtrace, perf, Instruments, VTune might provide a more fine grained picture but they do not necessarily have support for JITed code (see below).
JavaScript function is high on the profile
Ensure that this function is optimized and Crankshaft friendly. V8's tick processing scripts mark optimized functions with *
(asterisk) and non-optimized with ~
(tilda). You can also use --trace-opt --trace-deopt
flags to see what Crankshaft does with your program. Deoptimizations happen when assumptions made by the compiler does not match program's runtime behavior, bailouts happens when compiler can't compile the function with optimizations for some reason. [note that in V8 prior to version 3.13.4 you'll need to supply --trace-bailout
to see optimizing compiler bailouts] If you want to understand ideas behind V8 optimization pipeline I recommend to start by reading Andy Wingo's A Tale of Two Compilers post.
In general it's a good idea to know more about modern JavaScript VMs, especially their strengths and weaknesses. I recommend going through David Mandelin's talk Know Your Engines. This talk stresses a very important aspect of modern JS performance: fastest application is the one that is essentially statically typed in it's nature.
Modern JavaScript VMs try to grasp "static" structure hidden inside dynamic JS code by utilizing hidden classes and inline caches. Take a look at my slide deck to get a basic understanding of how those hidden classes are built and used.
For V8 it is also important to check how you store floating point numbers (and integers that exceed 31-bit range in case of ia32 version of v8) and use WebGL typed arrays if appropriate. These days V8 tries to adapt generic arrays' storage to the data you store in them, but understanding whether those optimizations kicked in or not might be difficult; thus I just recommend using typed arrays.
GC is high on the profile
Try to understand what your are allocating and (more important) what survives several GCs. The worst kind of object is the one that survives a couple of partial (aka scavenge) collections and then gets thrown away. This kind of workload is the most stressful for GC because it has to copy young objects around constantly. Objects that live long are less stressful (but you have to keep in mind that GC cost is proportional to the number of live objects). The best kind of object is the one that dies shortly after it's allocation. You can use --trace-gc
to see GC pauses and you can use built in heap snapshots to figure out what takes space in your heap. [it might be hard or impossible to capture "middle-aged" garbage with heap snapshots because V8 does full garbage collection before taking snapshot thus effectively killing all such garbage].
JS natives are high on the profile
When I say _natives_ I mean built in methods of String
/Number
/Boolean
/RegExp
/JSON
and global functions like parseInt
etc. Here you can't optimize anything directly but you can try to figure out two things:
- Try calling them less by changing your algorithms and/or fusing them into it. Some of those methods are very generic (e.g. forEach). Some can be fused with your functions (e.g. you have to parse integer contained in some stream: you can either build a temporary string character by character and pass it to parseInt or you can fuse parsing and reading from a stream; later is better)
- Is there some obvious performance problem with them? V8's implementation of the native method can be suboptimal. If you see a bug (or you suspect that it can be improved) please file a bug or write a question to v8-users mailing list.
Some strange V8 internals are high on the profile
In this case you can either read V8's source or send a question to v8-users list.
A lot of time is spent in your C++ code
Sorry this is out of scope. Consult C++ optimization guides :-)
Do you feel that V8's statistical profiler misses hotspot?
Your best bet then is either hardware counters based tool like Linux perf
for which V8 has support (see v8/tools/ll_prof.py --help
for more details) or trying to spot anomalies by some sort of software counters based profiling. V8 has it's own simple software counters subsystem (try passing --native-code-counters --dump-counters
to d8
shell).
Do you want to go deeper?
If you feel that generated code is slow and you can improve it you should definitely check it out using flags --print-code --code-comments
. You can also dump IR used by optimizing compiler with --trace-hydrogen
. IR will be written into hydrogen.cfg
file that can be viewed by C1 Visualizer.
Are you still lost?
Drop a line to me or better to v8-users mailing list. Try your best to provide as much context as possible (a standalone JS benchmark is the best way). It's nearly impossible to diagnose performance problems based on vague descriptions of what you are trying to achieve and how slow it runs.
There is something to be learned from a rainstorm. When meeting with a sudden shower, you try not to get wet and run quickly along the road. But doing such things as passing under the eaves of houses, you still get wet. When you are resolved from the beginning, you will not be perplexed, though you still get the same soaking
— Hagakure by Yamamoto Tsunetomo.
Similarly you have to be resolved from the beginning when you want to optimize your app. Randomly tweaking things in panic here and there does not help.
Understand before you act.