Samurais had something called bushidō, way of the warrior, code of conduct they had to follow. In similar manner you have to follow certain opt-dō if you want to optimize your application. I have tried to sketch such a path in my nodecamp.eu talk "Understanding V8" and +Daniel Clifford tried to do the same in "V8 Performance Tuning Tricks" talk on GDD11 in Berlin. But not everybody has seen those talks and the question keeps coming back again. So I decided to write down a quick check list for developers who want to optimize their apps. tl;dr version of my checklist is: "Understand before you act".
Understanding V8 and beyond [talks and posts]
UPDATE Nov 6 2012 More and more talks are being given about optimizing for V8 and understanding it's internals so I decided to maintain a list of them here. If I missed some interesting/useful blog post or talk about any JavaScript VM please send me a link and I will add it.
- Practical recommendations and optimization walk-throughs for V8
- Understanding V8 (me, nodecamp.eu 2011) [slides]
- V8 Performance Tuning Tricks (+Daniel Clifford, GDD2011 Berlin) [slides]
- Console to Chrome (+Lilli Thompson, GDC 2012) [slides] [video]
- Breaking the JavaScript Speed Barrier with V8 (+Daniel Clifford, Google I/O 2012) [slides] [video]
- Optimizing for V8 (series of blog posts from +Florian Loitsch, based on his experience writing dart2js compiler)
- Performance tips for JavaScript in V8 by +Chris Wilson
- Writing Fast, Memory-Efficient JavaScript by Addy Osmani
- The Footprint of Performance (video): Michael Starzinger describes memory implications of various programming patterns at JSConf EU 2012
- V8 talks (old ones might contain outdated information)
- V8: High Performance JavaScript Engine in Google Chrome (+Kevin Millikin, GDD 2008 London) [video]
- V8 Internals (+Mads Ager, Google I/O 2009) [video]
- Erik Meijer and +Lars Bak on Channel 9: Inside V8 - A Javascript Virtual Machine (2009)
- Crankshaft: Turbocharging the Next Generation of Web (+Kasper Lund, YOW 2011) [video] [slides]
- Fundamentals of V8 and other JS VMs
- Andy Wingo blogs about his adventures in V8's and JavaScriptCore's compilation pipelines
- David Mandelin's (SpiderMonkey TL) talk Know Your Engines (Velocity Conf 2011) [slides] [video].
- I am trying to explain inline-caching used by JavaScript VMs by writing IC in JavaScript, also see my talk "V8 Inside Out" from WebRebels 2012 [slides] [video]
- Building High-Performing JavaScript for Modern Engines: performance recommendations from Microsoft's JavaScript team (tailored to Chakra).
- Miscellaneous
- Can V8 do that?! (me, JSConf 2012) [slides] [vides]
- Channel 9: Lars Bak and Steve Lucco: Chakra, V8, JavaScript, Open Source
Do you understand what the application is trying to do and how?
The more you understand about your app the better you can optimize it. Sometimes a tricky algorithm or a cache placed in right place will yield more improvements than any local tweaking. Understanding your application in large is a very difficult problem which requires special tooling and discipline. I highly recommend to read @coda's Metrics Metrics Everywhere talk if you want to get a glimpse of that world. Sometimes it is possible to split big application into pieces and optimize them separately but there is no guarantee that overall gain will not be lost when those pieces are connected back together.
Did you profile your application with built in statistical profiler?
Profiling helps to discover obvious hot spots. Don't waste time rewriting places that occupy 0.0001% of running time. Concentrate your efforts on those that are high on the profile. If you are using V8's tick processors keep in mind that LazyCompile: prefix does not mean that this time was spent in compiler, it just means that the function itself was compiled lazily. Statistical profiler is not the most accurate tool in the world and might miss overheads that are finely spread across execution (as sampling interval is 2ms). Tools like dtrace, perf, Instruments, VTune might provide a more fine grained picture but they do not necessarily have support for JITed code (see below).
JavaScript function is high on the profile
Ensure that this function is optimized and Crankshaft friendly. V8's tick processing scripts mark optimized functions with * (asterisk) and non-optimized with ~ (tilda). You can also use --trace-opt --trace-deopt flags to see what Crankshaft does with your program. Deoptimizations happen when assumptions made by the compiler does not match program's runtime behavior, bailouts happens when compiler can't compile the function with optimizations for some reason. [note that in V8 prior to version 3.13.4 you'll need to supply --trace-bailout to see optimizing compiler bailouts] If you want to understand ideas behind V8 optimization pipeline I recommend to start by reading Andy Wingo's A Tale of Two Compilers post.
In general it's a good idea to know more about modern JavaScript VMs, especially their strengths and weaknesses. I recommend going through David Mandelin's talk Know Your Engines. This talk stresses a very important aspect of modern JS performance: fastest application is the one that is essentially statically typed in it's nature.
Modern JavaScript VMs try to grasp "static" structure hidden inside dynamic JS code by utilizing hidden classes and inline caches. Take a look at my slide deck to get a basic understanding of how those hidden classes are built and used.
For V8 it is also important to check how you store floating point numbers (and integers that exceed 31-bit range in case of ia32 version of v8) and use WebGL typed arrays if appropriate. These days V8 tries to adapt generic arrays' storage to the data you store in them, but understanding whether those optimizations kicked in or not might be difficult; thus I just recommend using typed arrays.
GC is high on the profile
Try to understand what your are allocating and (more important) what survives several GCs. The worst kind of object is the one that survives a couple of partial (aka scavenge) collections and then gets thrown away. This kind of workload is the most stressful for GC because it has to copy young objects around constantly. Objects that live long are less stressful (but you have to keep in mind that GC cost is proportional to the number of live objects). The best kind of object is the one that dies shortly after it's allocation. You can use --trace-gc to see GC pauses and you can use built in heap snapshots to figure out what takes space in your heap. [it might be hard or impossible to capture "middle-aged" garbage with heap snapshots because V8 does full garbage collection before taking snapshot thus effectively killing all such garbage].
JS natives are high on the profile
When I say _natives_ I mean built in methods of String/Number/Boolean/RegExp/JSON and global functions like parseInt etc. Here you can't optimize anything directly but you can try to figure out two things:
- Try calling them less by changing your algorithms and/or fusing them into it. Some of those methods are very generic (e.g. forEach). Some can be fused with your functions (e.g. you have to parse integer contained in some stream: you can either build a temporary string character by character and pass it to parseInt or you can fuse parsing and reading from a stream; later is better)
- Is there some obvious performance problem with them? V8's implementation of the native method can be suboptimal. If you see a bug (or you suspect that it can be improved) please file a bug or write a question to v8-users mailing list.
Some strange V8 internals are high on the profile
In this case you can either read V8's source or send a question to v8-users list.
A lot of time is spent in your C++ code
Sorry this is out of scope. Consult C++ optimization guides :-)
Do you feel that V8's statistical profiler misses hotspot?
Your best bet then is either hardware counters based tool like Linux perf for which V8 has support (see v8/tools/ll_prof.py --help for more details) or trying to spot anomalies by some sort of software counters based profiling. V8 has it's own simple software counters subsystem (try passing --native-code-counters --dump-counters to d8 shell).
Do you want to go deeper?
If you feel that generated code is slow and you can improve it you should definitely check it out using flags --print-code --code-comments. You can also dump IR used by optimizing compiler with --trace-hydrogen. IR will be written into hydrogen.cfg file that can be viewed by C1 Visualizer.
Are you still lost?
Drop a line to me or better to v8-users mailing list. Try your best to provide as much context as possible (a standalone JS benchmark is the best way). It's nearly impossible to diagnose performance problems based on vague descriptions of what you are trying to achieve and how slow it runs.
There is something to be learned from a rainstorm. When meeting with a sudden shower, you try not to get wet and run quickly along the road. But doing such things as passing under the eaves of houses, you still get wet. When you are resolved from the beginning, you will not be perplexed, though you still get the same soaking
— Hagakure by Yamamoto Tsunetomo.
Similarly you have to be resolved from the beginning when you want to optimize your app. Randomly tweaking things in panic here and there does not help.
Understand before you act.