Measuring mobile performance is hard
When Amazon announced their Silk browser I got excited reading about the “split architecture”. I’m not big on ereaders but I pre-ordered my Kindle Fire that day. It arrived a week or two ago. I’ve been playing with it trying to find a scientific way to measure page load times for various websites. It’s not easy.
- Since it’s a new browser and runs on a tablet we don’t have plugins like Firebug.
- It doesn’t (yet) support the Navigation Timing spec, so even though I can inspect pages using Firebug Lite (via the Mobile Perf bookmarklet) and Weinre (I haven’t tried it but I assume it works), there’s no page load time value to extract.
- Connecting my Fire to a wifi hotspot on my laptop running tcpdump (the technique evangelized by pcapperf) doesn’t work in accelerated mode because Silk uses SPDY over SSL. This technique works when acceleration is turned off, but I want to see the performance optimizations.
While I was poking at this problem a bunch of Kindle Fire reviews came out. Most of them talked about the performance of Silk, but I was disappointed by the lack of scientific rigor in the testing. Instead of data there were subjective statements like “the iPad took about half as long [compared to Silk]” and “the Fire routinely got beat in rendering pages but often not by much”. Most of the articles did not include a description of the test procedures. I contacted one of the authors who confided that they used a stopwatch to measure page load times.
If we’re going to critique Silk and compare its performance to other browsers we need reproducible, unbiased techniques for testing performance. Using a stopwatch or loading pages side-by-side and doing a visual comparison to determine which is faster are not reliable methods for measuring performance. We need better tools.
Anyone doing mobile web development knows that dev tools for mobile are lacking. Firebug came out in 2006. We’re getting close to having that kind of functionality in mobile browsers using remote debuggers, but it’s pretty safe to say the state of mobile dev tools is 3-5 years behind desktop tools. It might not be sexy, but there’s a lot to be gained from taking tools and techniques that worked on the desktop and moving them to mobile.
In that vein I’ve been working the last few days to build an iframe-based test harness similar to one I built back in 2003. I call it Loadtimer. (I was shocked to see this domain was available – that’s a first.) Here’s a screenshot:
The way it works is straightforward:
- It’s preloaded with a list of popular URLs. The list of URLs can be modified.
- The URLs are loaded one-at-a-time into the iframe lower in the page.
- The iframe’s onload time is measured and displayed on the right next to each URL.
- If you check “record load times” the page load time is beaconed to the specified URL. The beacon URL defaults to point to loadtimer.org, but you can modify it if, for example, you’re testing some private pages and want the results to go to your own server.
- You can’t test websites that have “framebusting” code that prevents them from being loaded in an iframe, such as Google, YouTube, Twitter, and NYTimes.
There are some subtle optimizations worth noting:
- You should clear the cache between each run (unless you explicitly want to test the primed cache experience). There’s no way for the test harness to clear the cache, but it does have a check that helps remind you to clear the cache. (It loads a script that is known to take 3 seconds to load – if it takes less than 3 seconds it means the cache wasn’t cleared.)
- It’s possible that URL 1′s unload time could make URL 2′s onload time be longer than it actually should be. To avoid this
about:blankis loaded between each URL.
- The order of the preset URLs is randomized to mitigate biases across URLs, for example, where URL 1 loads resources used by URL 2.
Two biases that aren’t addressed by Loadtimer:
- DNS resolutions aren’t cleared. I don’t think there’s a way to do this on mobile devices short of power cycling. This could be a significant issue when comparing Silk with acceleration on and off. When acceleration is on there’s only one DNS lookup, whereas when acceleration is off there’s a DNS lookup for each hostname in the page (13 domains per page on average). Having the DNS resolutions cached gives an advantage to acceleration being off.
- Favicons aren’t loaded for websites in iframes. This probably has a negligible impact on page load times.
Have at it
There’s also a results page. If you select the “record load times” checkbox you’ll be helping out by contributing to the crowdsourced data that’s being gathered. Getting back to what started all of this, I’ve also been using Loadtimer the last few days to compare the performance of Silk to other tablets. Those results are the topic of my next blog post – see you there.