Posted by dohertyjf
This post was originally in YOUmoz, and was promoted to the main blog because it provides great value and interest to our community. The author's views are entirely his or her own and may not reflect the views of SEOmoz, Inc.
The advent of social media has brought a host of changes to the SEO industry, and online marketing as a whole. You would be hard-pressed to find a business with a decent online presence that does not have a Twitter account connected to their website, and at least one way to find that account.
Google and Bing recently alerted the SEO industry to the fact that they are using social media signals as a factor in ranking websites. Danny Sullivan of Search Engine Land wrote a post back in December 2010 talking about what social signals he thought the search engines are and will be using.
Here is an excerpt from that article:
Bing:
We do look at the social authority of a user. We look at how many people you follow, how many follow you, and this can add a little weight to a listing in regular search results. It carries much more weight in Bing Social Search, where tweets from more authoritative people will flow to the top when best match relevancy is used.Google:
Yes, we do use it as a signal. It is used as a signal in our organic and news rankings. We also use it to enhance our news universal by marking how many people shared an article...
Jen Lopez of SEOmoz also wrote an article called A Tweet's Effects on Rankings. In this article, she mentioned how Smashing Mag had tweeted about SEOmoz's Beginner's Guide to SEO. After this tweet, the Beginner's Guide to SEO page on SEOmoz jumped to #4 in the SERPs for "Beginner's Guide." The ranking has bounced around since, sometimes on the second page. At the time of writing this post, when logged out of Google and using an Incognito window in Chrome, it sits at #4 still.
My Research
After these interesting studies and admissions by the search engines, I decided to do a study of my own, using both my own Twitter account and that of the website I worked on as an in-house SEO.
Study #1
I wrote an article on February 15th about the then upcoming Distilled Linkbuilding Conference in London. Tom and Will Critchlow tweeted the link to my article, which was on a relatively new domain at the time (my personal site), to their followers. Tom also posted a correction, with the URL still in place. Lynsey Little, the event manager at Distilled, retweeted Tom's correction and also my tweet about the post being updated to reflect the true state of the New Orleans Conference. Upon review of the Topsy.com summarization of the tweets, it listed both Tom and Will as "very influential." This was an "ah-hah" moment.
Since the post went live, it has been at #3 or #4 for the search "distilled linkbuilding london." It is in similar places for "distilled linkbuilding conference". "Distilled linkbuilding" returns around result #7.
Summary and analysis:The article URL was tweeted a total of seven (7) times, three times by influential followers. It was retweeted five (5) of those times. I surmise that the number of influential tweets, the number of retweets, and the fact that the search terms are not very competitive as the reasons why my post still ranks so highly.
Study #2
I worked for an online college portal website, which myself and two other SEOs worked on daily. We ranked well for some competitive terms, so I was interested to see what would happen if I started tweeting the phrase "accredited online colleges only" using the website's Twitter account. I also decided to retweet one of the tweets using my personal account, to see if that had any effect.
Here are the rankings before the tweets:
Bing: 2
Google: 5
Here are the rankings after the tweets:
Bing: 2 + 3
Google: 5
One week later, and after the Google Farmer/Panda content farm update, here are the rankings:
Bing 2 + 3
Google: 6
The page was already ranking on the middle of the first page for Google and in rank #2 for Bing. After the tweet from a non-influential account, no noticeable change occurred, except that two pages for the website began appearing in Bing.
Summary and analysis: The search term is a rather competitive term, so it is not a surprise that a couple of tweets from non-influential Twitterers would not affect the rankings. I do not know if the tweet had an effect on Bing's decision to show two results instead of one for the query.
Study #3
I wrote a summary of a New York Times article about the Fiske Guide, a list of colleges and universities on the Internet, developing an app for the iPad, which makes the guide interactive and useful for high school seniors and their parents. I titled the blog post "NY Times Summary: The Fiske Guide Goes iPad". After I wrote the article, I tweeted it using the work account. I also submitted it to StumbleUpon.
After the tweet and StumbleUpon submission, here were the rankings for the article for the search query "Fiske Guide", which I performed while logged out of Gmail, in an incognito window in Chrome, and using a Google location-independent query:
Google: 9
Bing: 56
Here is a snapshot of the traffic, which started from the first day.
One week later and after the Farmer/Panda update, here were the rankings:
Google: Page 13
Bing: 54
Summary and analysis:I think this one was caught by the Farmer/Panda update, because the ranking tanked after the update. Long-term ranking is inconclusive because of the algorithm update, but the trend holds true that an initial tweet helps a new article to be indexed and rank quickly.
Study #4
Now, here is where it gets interesting. In order to test if Twitter tweets had an effect on rankings, I decided to write another article called "What Is The Fiske Guide?" on the company blog. I then decided that I would wait a couple of days before tweeting it with the work account.
After the article was publish and no tweet was given, here were the rankings two days later for "Fiske Guide", using the same search terms as above:
Google: not found
Bing: not found
I then decided to tweet the article to see what might happen After the "no tweet" article was tweeted, here were the rankings:
Google: 35
Bing: n/a
One week later and after Farmer Update, the rankings had changed a bit:
Google: 8
Bing: 58
Summary and Analysis: I purposefully did not tweet out an article that I thought had a better chance of ranking than the first article. I let the article sit for two days, and it was not to be found in the search results. After the article was tweeted, it took a bit of time, but the article eventually made its way onto the first page of the Google search query "Fiske Guide."
Study #5
Once again, I needed to test and see if my suspicions were correct about social media signals helping articles to get discovered initially. I wrote a blog post called "Top 15 Inspirational Business Quotes", which I then tweeted.
It was published on Friday. After 4 tweets, which were comprised of one (1) from us, one (1) from a follower, and then retweets from two (2) of her followers, here were the rankings:
Google: 1
Bing: 2
And here was the organic traffic, which started on the day of publication:
Six days later, on a Thursday, the rankings were the same:
Google: 1
Bing: 2
The blog that I was writing on is fairly high-traffic, so Mr. Googlebot crawled it frequently. I noticed, however, that when I did not tweet an article, it would often take 2-3 days before it is crawled. When I did tweet the article, I received a Google Alert (I have one set up for the website) a couple of hours later, which showed that it had been discovered and indexed by Google.
Study #6
"Domain Trust Factors" (more competitive)
I wrote an article entitled "Four Factors that influence Domain Trust" and tweeted it to the world. I tweeted the URL three (3) times, and it was retweeted twice.
After the tweets from me and 2 retweets, here were the rankings for "Domain Trust Factors":
Bing: not found
Google: 48
It currently resides somewhere in the middle of the 5th page on Google. It was not found on Bing for a while because issues with my CMS.
Analysis and Conclusion: The article was indexed quickly, but it did not and does not rank well. From the previous examples, the ranking trend makes sense, because the term is more competitive, and the indexing trend also holds true.
Conclusion
I have come to believe that tweets, and possibly other social media signals, are becoming increasingly important for the search engines when trying to discover new material on websites. This has held true for the couple of websites that I administrate.
When new articles are tweeted, they are discovered and indexed quickly. When they are not tweeted, it takes the search engine bots more time to find and index them. This leads me to believe that search engines are watching Twitter feeds for indexation purposes, and when tweeters or retweeters are influential, they are using that information for ranking the articles.
The number of tweets and the number of tweeters, however, seems to make a difference for ranking. Articles that were just tweeted one time have reached a maximum of result #8. The two that were tweeted more times have ranked and are ranking higher. From this I think we can assume safely that the more times a new article or page gets tweeted, the better chance the target URL has of ranking well.
Three Takeaways
- Always post the link on Twitter when you publish a new article. This is common sense for SEOs, but we should also recommend this to clients.
- If possible, have 2-3 or more people who will always tweet your links. Since my findings show that the number of tweets may positively affect rankings, the more tweets you have guaranteed, the better chance your article will have of ranking, even for only a short period of time.
- To bump an established link up in rankings, it seems necessary that the tweet come from a well-respected, influential, and relevant Twitter account. Of course, when the tweet comes from a respected account, it will often be retweeted numerous times (126 at last count, according to Topsy) and clicked many times (over 9,800 at last check).
If you have done any testing into a tweet's effect on rankings, please leave your findings in the Comments section below!
About the author: John Doherty is the newest member of Distilled NYC. You can find him on Twitter: @dohertyjf.
Posted by jennita
.cleardiv { clear: both; font-size: 0px; line-height: 0px; height: 0px; } .extra-pad, #infoRight { height: 155px; }
Come on now, you didn't think I'd spill beans just like that did you? Before I give you the goods, let me refresh your memory in case this is the first time you're hearing about MozCation. A few weeks ago, we asked people to create some sort of unique content and tell us why we should have an SEOmoz Meetup (aka MozCation) in their city.
Overwhelming Response
The nominations started in rather slowly and at first we thought that perhaps MozCation would be a complete failure! Soon a number of nominations started trickling in, however people weren't creating new content. They were simply linking to city government websites and wikipedia pages... which wasn't what we were looking for. There was an email thread going around internally about how we needed to write another post explaining better what we were looking for.
But then, the first "real" nomination came in from Peru. We were delighted and quite surprised, we were unaware we had a following in Peru (little did we know), and the retweets started rolling in. Soon after, Portsmouth, NH was nominated and then Calgary. By this time, we were excited that we were getting some unique submissions and figured we'd have a handful of cities to pick from. What we didn't realize was that the next nomination would be for Spain, and it would change the whole game. Gianlucahad bought the domain mozcation.com and built an entire site! Holy game changer. Soon after, people started creating Twitter and Facebook accounts, building websites and rallying their local communities. Check out all the submissions below. :)
Picking A City
Whew. Now, how on earth do we pick onecity out of all the great nominations we received? Well, we decided we didn't have to. We're the keeper of the rules, and in true TAGFEE fashion, we decided to pick fourcities instead. :) I don't want to ruin all the fun, so I'll just let Rand spill the beans:
(Be sure to watch the entire video!)
if(!navigator.mimeTypes['application/x-shockwave-flash'] || navigator.userAgent.match(/Android/i)!==null)Wistia.VideoEmbed('wistia_382968',600,337,{videoUrl:'http://seomoz-cdn.wistia.com/deliveries/9295d5c963b5a3e67d168615a373f00b8ecdacb5.bin',stillUrl:'http://seomoz-cdn.wistia.com/deliveries/9926af9b706941e6790f054c0193dff8a7c4ce55.bin',distilleryUrl:'http://distillery.wistia.com/x',accountKey:'wistia-production_3161',mediaId:'wistia-production_382968',mediaDuration:235.04})
The Nominees
Take a look at all 15 nominated cities in no particular order. Everyone worked really hard on these sites, pages, videos, etc. I know they'd love for you to take a peek at all their hard work.
Title: Why SEOmoz Should Come To Denver For Mozcation: A Love Story
Nominated by: @Location3
Title: The MozCation should definitely add Houston, TX to its list of stops
Nominated by: @CarterCole
Other Nominations: MozCation:Houston from @juliaalaniz and Mozcation in Houston from @googleismybf
Title: MozCation - Salt Lake City
Nominated by: @Sams_Antics
Other Nominations: 1,276 Reasons #Mozcation is Coming to SLC from @shuey03 and Come To Salt Lake City, Mozcation! from @davidscoville and This Is The Place...For a MozCation from @iamadrianlazo
Title: Hey Roger! Visit Your Filipino Fans, SEOmoz Buddies in Manila
Nominated by: @seoteky
Now What?
Yes we chose four cities, but we will still be holding at least one more round of nominations for another MozCation later in the year. So if you missed out this time, or if your city wasn't chosen, keep a watch on the blog! We'll be announcing open nominations again in a couple months. So stay tuned! We'll also keep you up-to-date on dates/times for all the MozCation events coming up.
Thank You
Never did we imagine that we'd get so many nominations. The sheer number of tweets that came in for #MozCation was enough to humble us. Thank you to everyone who worked hard to nominate their city. You've made is so proud to be a part of this amazing community. We look forward to seeing you all!
In an effort to increase the transparency and frequency of our release cycles, the YUI team has been trying out some new processes. Over the coming months, we’ll be posting more granular information about our development sprints, and we’ll also be providing regular preview releases to the community for testing. Today’s release is the first in a series of previews along the way to the 3.4.0 GA release. Without further ado, YUI 3.4.0 PR1 is now available via CDN: http://yui.yahooapis.com/3.4.0pr1/build/yui/yui-min.js.
We would appreciate any early testing by our community. The list of issues addressed in this release is available here, and you can file any new bugs and regressions you find here (you must be logged in to file a ticket).
As a very brief intro the Guardian Developer Network Drop In (GDNdi) it is explained here and the next one’s announced here. As a result of the May drop ins, Alan Rusbridger invited three of the developers to speak at the Guardian morning conference about their projects and what they were working on, including Anna Powell-Smith’s work on the Domesday Book and the Domesday Map, Angus Fox’s work on Multizone social mobile apps for UK Police and Rob Mckinnon’s Who’s Lobbying – all projects that are self-funded/delivered on a shoe-string but driven by the developers’ own passion for the subject. There were many developers who dropped in and who will feature if they fancy, but these three were a lovely start.
The burning question after the morning conference was: ‘how can I learn to code?’
The following is advice received from Anna and Rob, I can lay no claim other than asking the question, getting the answer and permission to blog about it – but it is such a common question that I thought it was important to share their advice, as it can help everyone (who wants to know
)
As far as programming languages go for the beginner we recommend Python or Ruby above anything else, certainly!
Learn Python the Hard Way: http://learnpythonthehardway.org (*FREE* book with structured exercises) Paul Bradshaw, who is a journalist who has learned to code, used this and recommended it highly.
There’s a nice quote in the afterword to the book:
“Programming as a profession is only moderately interesting….. You are much better off using code as your secret weapon in another profession. People who can code in biology, medicine, government, sociology, physics, history, and mathematics are respected and can do amazing things to advance those disciplines.”
There’s an online Ruby tutorial here: http://TryRuby.org/
The ScraperWiki tutorials are good to do some practical scraping (Anna Powell-Smith wrote a bunch of them) http://scraperwiki.com/about/Once you’ve done the book/tutorial, then you should pick a real-world problem you want to solve, find a tame coder, and just do it.
In the spirit of “open everything” and hopefully the benefit you have gained from this advice, please do champion Anna and Rob’s projects as well as those of any developers you know. And if you are a developer, please do come along to our drop-ins, you can work in peace but we also have lots of things happening at the moment in the Guardian if you want to know a bit more.
There is of course the fabulous hacks and hackers for those journalists who are keen on being a part of a community hungry to learn more.
This blog is aimed primarily at Estate Agent Today and readers of the same following their recent article lamenting Twitter. See our take on it of a few days ago.
Despite numerous complaints, comments and tweets I saw only one reply from the EAT and that was within another article sarcastically saying they had their wrists slapped for speaking about Twitter etc. Frankly, this is not good enough and whilst the EAT may be ok for regurgitating news, agents should we wary of attaching any weight to such articles that speak of what they plainly do not understand.
In a small effort to actually support a position I have taken with some logical reasoning the following are my ramblings on the subject.
So to kick off, how woefully naive it is to say Twitter is ‘waste of time”. Kids say ….”this is waste of time…” when their computer won’t logon in a nanosecond. I am not saying every agent has to use Twitter, but I am saying every agent ought to consider using it.
The Americans are of course ahead of the game and if they saw the EAT article I suspect they too would be astounded. Here is a summary of a recent FOREM post about being successful on Twitter.
They make the point that there is more to twitter than just how many people follow you or how many you follow. It is about communicating with those people and building relationships a network – remember those network evenings – finding the right people to communicate with and progressing with new contacts. Its what you used to do, just in a different way.
But where is the ROI I see some say – on EAT admittedly – ‘..I haven’t sold a house from it …’ ‘..I have some followers but don’t see the point..’ The ROI is pushed out matter of factly without examining what that statement actually means. The investment is your time and just owning a twitter account will not bring you a house sale, you have to put some effort in and even then the intention of twitter is not to sell a house per se! It is more of a flow of communication leading to greater awareness, brand presence and thus more business or perhaps just ‘better’ business.
Social Media and twitter especially is about people. People talking about you, recommending you, talking to you. It is advertising money cannot buy.
The doubting Thomas’ such as the EAT just broadcast (as many commentators have stated), they don’t like twitter but put a twitter page up nonetheless and then just broadcast. They don’t want to build a social buzz around them. But as agents, you all have a unique opportunity to embrace this space.
The Americans hand out facebook cookies to promote their facebook pages at open days. Now before you say how twee, just remember we are only now getting used to open days for crying out loud. Imagine how ahead of the game you would be if you were innovative enough to come up with some ideas around a similar theme. I am not saying you have to go an bake cookies, I am just indicating that by embracing any idea that encourages social media you could have perhaps 50 people in and around the area all talking about you and your practice — And what would they do …….? Yes, they would tweet about it !!
Anyway, the possibilities are there for those who want to take it.
I run a number of businesses and all have twitter accounts all acting in a slightly different manner but all working and yes I have made direct sales through twitter.
Forem go on to give 5 secrets to successful tweeting as follows:
1. Don’t use all 140 characters. Allow space for people to comment when retweeting
2. Retweet alot. It is all about the retweet ![]()
3. #FF – for those who don’t know this is follow Friday which means start with the hashtag and FF and then name a few people who you have liked during the week and announce it on Friday. (get used to using hashtags as well) – I am guilty of not doing this enough
4. Post content you like to read. “oh, that is of interest” post it and share it.
5. don’t spend hours thinking about what to say.
My points
1. Be yourself and behave as you would in public.
2. Do not constantly tweet property details
3. Twitter works best at local level (for me I have found this). I have honestly met more people through twitter than I could have hoped to have met ordinarily. What does this all lead to….. yes, business!
4. Don’t be afraid to put yourself out there. I have long bemoaned the secrecy and culture of non communication between agents and third parties. Shake all that off and get amongst it.
5. a huge % of the population in on twitter* Your next generation of customer are habitual users of social media, so should you be.
6. Be courteous and polite to others.
7. One of the strongest points is that Google and Bing take into account your presence on Twitter when determining search criteria. Do I need to repeat this?!
Anyway, I am off to submit my blog to the EAT, do you think they will publish it?
* I have seen stats everywhere, but lets just stick to huge, large, big, enormous etc
Let me try to set the mood for Velocity 2011 with an attempt to publish one performance-y post a day for the seven days between now and when the conference starts.
#7 - Lazy HTML evaluation
#6 - Preload in visual search suggestions
#5 - perfplanet.com is open
#4 - YSlow 2.0: the first sketches
#3 - Book of Speed
#2 - Sultans of Speed
#1 - Overlooked Optimizations: Images - guest post by Billy Hoffman
Tada! Velocity is here!
#7 This post is part of the Velocity countdown series. Stay tuned for the articles to come.
Some time ago Google talked about using a sort of lazy JavaScript evaluation which especially helps mobile devices. The idea was to comment out a chunk of JavaScript you don't need right away and serve it this way. Later, when you need it, you get the content of the commented code and eval() it. More here and here.
At the last Fronteers conference I had the pleasure of chatting with Sergey Chikuyonok, who is so great and (among other things) is responsible for coming up with zen coding and writing a bunch of deep articles on image optimization for Smashing Magazine. So he told me he experimented with similar lazy HTML evaluation and it proved to be incredibly helpful for mobile devices. Not only the overall experience is faster but the initial rendering happens sooner and we all know how important that is.
Sergey is a busy person and chances of him writing about his experiment in English seemed pretty low at the time, so I decided to do an experiment on my own and see what happens. Meanwhile he did write about it so I forgot all about my findings, but here they are now.
Long document
I took one big HTML document - The adventures of Sherlock Holmes, which is half a megabyte or about 200K gzipped. Page A is the document as-is, plus some JS for measurements.
Page B (lazy) is the same page but with about 95% of its content commented out. The remaining 5% is a whole chapter so there's plenty of time to deal with the rest while the user is reading. After onload and a 0-timeout I take the commented markup (conveniently placed in <div id="lazy-daze">) and strip the comments. Then take the "unwrapped" time after another 0-timeout to let the browser repaint the DOM and regain control.
The overall skeleton of the lazy page is like so:
<!doctype html> <html> <body> <h1>THE ADVENTURES OF<br/> SHERLOCK HOLMES</h1> ... ... to chat this little matter over with you.</p> <div id="lazy-daze"> <!-- <p>II.</p> <p> At three o’clock precisely ... ... she has met with considerable success.</p> --> </div> <script> window.onload = function () { setTimeout(function(){ var daze = document.getElementById('lazy-daze'), inner = daze.innerHTML; daze.innerHTML = inner.substring(4, inner.length - 4); setTimeout(function(){ // take end time... }, 0); }, 0); }; </script> </body></html>
Experiment
All the files are here:
http://www.phpied.com/files/lazyhtml/
We have the plain normal document - http://www.phpied.com/files/lazyhtml/sherlock-plain.html
And the lazy one - http://www.phpied.com/files/lazyhtml/sherlock-lazy.html
In order to run the experiment you just go to
http://www.phpied.com/files/lazyhtml/start.html
And click "Go nuts". This will load each of the two documents 20 times and take a few time measurements. "Go nuts" again and you'll get 20 more data points.
The time measurements I take are:
- "plain" - unload to onload of the base version
- "lazy" - unload to onload of the lazy version NOT including unwrapping it. This should be quicker than the plain version
- "unwrapped" - unload to onload plus time to unwrap and rerender - this is expected to be bigger than "plain" because the browser has to render twice and is therefore doing more work
- DOM loaded "plain" - unload to DOMContentLoaded instead of onload
- DOM loaded "lazy"
Then I take the same 5 measurements but instead of starting at unload of the previous page, it starts at the top of the documents, as soon as a timestamp can be taken with JavaScript. This will exclude DNS, establishing connection, time to first byte...
Results
Here are the results from back when I did the experiment originally last year, using iPhone 2 (with iOS 3.2 or thereabouts)

I ran this experiment over Wifi and again over 3G.
First striking thing - it takes the about the same time to load the plain old page over Wifi and over 3G. For the smaller, "lazy" document, there is a difference, but there's virtually none for the plain base page. The guess here is that the rendering and its cost in terms of memory and CPU is far greater than the actual download time. In other words it takes longer to render than it does to download an HTML. At least in this class of phones. This guess is confirmed when you look at the time from the top of the documents, when the request overhead is removed:

With or without the request time - it's all pretty much the same.
The next striking thing - and how about that lazy document! It renders 3-4 times faster than the whole plain document. Not bad.
And one more surprise - lazy+unwrap time is less than the plain old document. Now that's interesting. It appears faster to split the task into two and do the whole double-rendering, which should've been slower because it's extra work. I guess that poor phone really chokes on the long document.
The same I found is true in Firefox, but almost the difference is negligible.
iPhone 4
I repeated the experiment tonight on iPhone 4 and wifi. And boy, is there a difference. What used to take 13 seconds is now under 3s.
The lazy + unwrap time is more than the plain time, which was to be expected.
Rendering that initial lazy document is still 2-3 times faster that waiting for the whole document.
The numbers:
- 2765 plain (2014 DOM)
- 1268 lazy
- 2995 lazy+unwrap
Ignoring the request overhead:
- 2200 plain (1421 DOM)
- 715 lazy
- 2423 lazy+unwrap
And one last run/observation - on the 3G and iPhone 4 there isn't much benefit of lazy-evaluation and empty cache. The request seems much more expensive. unload to onload 4.9s where document top to onload is 2.5. When the request overhead is out of the picture than lazy eval wins again - 1.7s compared to 2.5s
Parting words
- Lazy HTML FTW?
- Who the heck loads an entire book in a page?! Well it may happen. It may not be a whole book, but just a lot of markup. The entire book gzipped was 219K. A hefty document, but have you seen some of those news sites?
- Possible use case - blog comments. Lots and lots of blog comments. Or posts.
- If you're going to lazy-load something and get it with an ajax request, why not save yourself the request and ship with another chunk of html
- This was a simple layout task. Just a bunch of text. I'm guessing there could be much more complicated pages and layouts to render. And rendering is what takes the time it seems.
- Drawbacks a plenty because of the hidden content - accessibility, SEO.
Thoughts? Anyone want to run the test on Android or any other phone/device/tab/pad/whathaveyou? The guess is that the newer/powerful the device the smaller the difference. But it will be nice to know.
Note to self :
when virtualbox acts up again .
[sdog@stillmine ~]$ VBoxManage list runningvms
Now that flash storage is becoming more popular, IO alignment question keeps popping up more often than it used to when all we had were rotating hard disk drives. I think the reason is very simple – when systems only had one bearing hard disk drive (HDD) as in RAID1 or one disk drive at all, you couldn’t really have misaligned IO because HDDs operate in 512-byte sectors and that’s also the smallest amount of disk IO that systems can do. NAND flash on the other hand can have a page size of 512-bytes, 2kbytes or 4kbytes (and often you don’t know what size it is really) so the IO alignment question becomes more relevant.
It was and still is, however, relevant with HDD RAID storage – technology we have been using for many years – when there’s striping like in RAID0, 5, 6 or any variation of them (5+0, 1+0, 1+0+0 etc.). While IO inside the RAID is perfectly aligned to disk sectors (again due to the fact operations are done in multiples of 512-bytes), outside of the RAID you want to align IO to a stripe element as you may otherwise end up reading or writing to more disks than you would like to. I decided to do some benchmarks on a hard disk array and see when this matters and whether it matters at all.
In this article I will however focus on the process of alignment, if you’re curious about benchmark results, here they are.
What is IO alignment
I would like to start with some background on IO alignment. So what is IO alignment and how does a misaligned IO look like? Here is one example of it:

In this case the RAID controller is using 32KB stripe unit and that can fit in 2 standard InnoDB pages (16KB in size) as long as they are aligned properly. In first case when reading or writing a single InnoDB page RAID will only read or write to a single disk because of the alignment to a stripe unit. In the second example however every other page spans two disks so there is going to be twice as many operations to read or write these pages which could mean more waiting in some cases and more work for the RAID controller for that same operation. In practice stripes by default are bigger in size – I would often see see 64KB (mdadm default chunk size) or 128KB stripe unit size so in these cases there would be fewer pages spanning multiple disks so the effects of misalignment would be less significant.
Here’s another example of misalignment, described in SGI xfs training slides:

D stands for the disk here so the RAID has 4 bearing disks (spindles) and if there’s a misalignment on the file system, you can see how RAID ends up doing 5 IO operations – two to D4 and one on each of the other three disks instead of just doing one IO to each of the disks. In this case even if this is the single IO request from OS, it’s guaranteed to be slower both for reading and writing.
So, how do we avoid misalignment? Well, we must ensure alignment on each layer of the stack. Here’s how a typical stack looks like:

Let’s talk about each of them briefly:
InnoDB page
You don’t need to do anything to align InnoDB pages – file system takes care of it (assuming you configure the file system correctly). I would however mention couple things about InnoDB storage: first – in Percona Server you can now customize page size and it may be good idea to check that page size is no bigger than stripe element; second – logs are actually written in 512 byte units (in Percona Server 5.1 and 5.5 you can customize this) while I will be talking here about InnoDB data pages which are 16KB in size.
File system
File system plays very important role here – it maps files logical address to physical address (at a certain level) so when writing a file, file system decides how to distribute writes properly so they make the best use of the underlying storage, it also makes sure file starts in a proper position with respect to stripe size. The size of logical IO units also is up to the file system.
The goal is to write and read as little as possible. If you gonna be writing small (say 500 byte) files mostly, it’s best to use 512-byte blocks, for bigger files 4k may make more sense (you can’t use blocks bigger than 4k (page size) on Linux unless you are using HugePages). Some file systems let you set stripe width and stripe unit size so they can do a proper alignment based on that. Mind however that different file systems (and different versions of them) might be using different units for these options so you should refer to a manual on your system to be sure you’re doing the right thing.
Say we have 6-disk RAID5 (so 5 bearing disks) with 64k stripe unit size and 4k file system block size, here’s how we would create the file system:
xfs - mkfs.xfs -b 4k -d su=64k,sw=5 /dev/ice (alternatively you can specify sunit=X,swidth=Y as options when mounting the device) ext2/3/4 - mke2fs -b 4096 -E stride=16,stripe-width=80 /dev/ice (some older versions of ext2/3 do not support stripe-with)
You should be all set with the file system alignment at this point. Let’s get down one more level:
LVM
If you are using LVM, you want to make sure it does not introduce misalignment. On the other hand it can be used to fix it if it was misaligned on the partition table. On the system that I have been benchmarking, defaults worked out just fine because I was using a rather small 64k stripe element. Have I used 128k or 256k RAID stripe elements, I would have ended up with LVM physical extent starting somewhere in the middle of the stripe which would in turn screw up file system alignment.
You can only set alignment options early in the process when using pvcreate to initialize disk for LVM use, the two options you are interested in are –dataalignment and –dataalignmentoffset. If you have set the offset correctly when creating partitions (see below), you don’t need to use –dataalignmentoffset, otherwise with this option you can shift the beginning of data area to the start of next stripe element. –dataalignment should be set to the size of the stripe element – that way the start of a Physical Extent will always align to the start of the stripe element.
In addition to setting correct options for pvcreate it is also a good idea to use appropriate Volume Group Physical Extent Size for vgcreate – I think default 4MB should be good enough for most cases, when changing however, I would try to not make it smaller than a stripe element size.
To give you a bit more interesting alignment example, let’s assume we have a RAID with 256k stripe element size and a misalignment in partition table – partition /dev/sdb1 starts 11 sectors ahead of the stripe element start (reminder: 1 sector = 512 bytes). Now we want to get to the beginning of next stripe element i.e. 256th kbyte so we need to offset the start by 501 sectors and set proper alignment:
pvcreate --dataalignmentoffset 501s --dataalignment 256k /dev/sdb1
You can check where physical extents will start (or check your current setup) using pvs -o +pe_start. Now let’s move down one more level.
Partition table
This is the most frustrating part of the IO alignment and I think the reason people get frustrated with it is that by default fdisk is using “cylinders” as units instead of sectors. Moreover, on some “older” systems like RHEL5 it would actually align to “cylinders” and leave first “cylinder” blank. This comes from older times when disks were really small and they were actually physical disks. Drive geometry displayed here is not real- this RAID does not really have 255 heads and 63 sectors per track:
db2# fdisk -l Disk /dev/sda: 1198.0 GB, 1197998080000 bytes 255 heads, 63 sectors/track, 145648 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes ... Device Boot Start End Blocks Id System /dev/sda1 * 1 1216 9764864 83 Linux /dev/sda2 1216 1738 4194304 82 Linux swap / Solaris Partition 2 does not end on cylinder boundary. /dev/sda3 1738 145649 1155959808 83 Linux Partition 3 does not end on cylinder boundary.
So it makes a lot more sense to use sectors with fdisk these days which you can get with -u when invoking it or with “u” when working in the interactive mode:
db2# fdisk -ul Disk /dev/sda: 1198.0 GB, 1197998080000 bytes 255 heads, 63 sectors/track, 145648 cylinders, total 2339840000 sectors Units = sectors of 1 * 512 = 512 bytes ... Device Boot Start End Blocks Id System /dev/sda1 * 2048 19531775 9764864 83 Linux /dev/sda2 19531776 27920383 4194304 82 Linux swap / Solaris Partition 2 does not end on cylinder boundary. /dev/sda3 27920384 2339839999 1155959808 83 Linux Partition 3 does not end on cylinder boundary.
The rest of the task is easy – you just have to make sure that Start sector divides by number of sectors in a stripe element without a remainder. Let’s check if /dev/sda3 aligns to 1MB stripe element. 1MB is 2048 sectors, dividing 27920384 by 2048 we get 13633 so it does align to 1MB boundary.
Recent systems like RHEL6 (not verified) and Ubuntu 10.04 (verified) would by default align to 1MB if storage does not support IO alignment hints which is good enough for most cases, however here’s what I got on Ubuntu 8.04 using defaults (you would get the same on RHEL5 and many other systems):
db1# fdisk -ul Disk /dev/sda: 1197.9 GB, 1197998080000 bytes 255 heads, 63 sectors/track, 145648 cylinders, total 2339840000 sectors Units = sectors of 1 * 512 = 512 bytes Disk identifier: 0x00091218 Device Boot Start End Blocks Id System /dev/sda1 * 63 19535039 9767488+ 83 Linux /dev/sda2 19535040 27342629 3903795 82 Linux swap / Solaris /dev/sda3 27342630 2339835119 1156246245 8e Linux LVM
sda1 and sda3 do not even align to 1k. sda2 aligns up to 32k but the RAID controller actually has 64k stripe so all IO on this system is unaligned (unless compensated by LVM, see above). So on such a system, when creating file systems with fdisk, don’t use the default value for a start sector, instead use the next number that divides by the number of sectors in a stripe element without a reminder and make sure you’re using sectors as units to simplify the math.
Besides DOS partition table which you would typically work with using fdisk (or cfdisk, or sfdisk), there’s also a more modern – GUID partition table (GPT). The tool for the task of working with GPT is typically parted. If you are already running GPT on your system and want to check if it’s aligned, here’s a command for you:
db2# parted /dev/sda unit s print Model: LSI MegaRAID 8704EM2 (scsi) Disk /dev/sda: 2339840000s Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 2048s 19531775s 19529728s primary ext4 boot 2 19531776s 27920383s 8388608s primary linux-swap(v1) 3 27920384s 2339839999s 2311919616s primary
This is the same output we saw from fdisk earlier. Again you want to look at Start sector and make sure it divides by the size of stripe element without a reminder.
Lastly, if this is not a system boot disk you are working on, you may not need partition table at all – you can just use the whole raw /dev/sdb and either format it with mkfs directly or add it as an LVM physical volume. This let’s you avoid any mistakes when working on partition table.
RAID stripe
Further down below on the storage stack there’s a group of RAID stripe units (elements) sometimes referred to as a stripe though most of the tools refer to it as a stripe width. RAID level, number of disks and the size of a stripe element set the stripe width size. In case of RAID1 and JBOD there’s no striping, with RAID0 number of bearing disks is actual number of disks (N), with RAID1+0 (RAID10) it’s N/2, with RAID5 – N-1 (single parity), with RAID6 – N-2 (double parity). You want to know that when setting parameters for file system but when RAID is configured, there’s nothing more you can do about it – you just need to know these.
Stripe unit size is the amount of data that will be written to single disk before skipping to next disk in the array. This is also one of the options you usually have to decide on very early when configuring RAID.
Disk sectors
Most if not all hard disk drives available on the market these days use 512-byte sectors so most of the time if not always you don’t care about alignment at this level and nor do RAID controllers as they also operate in 512-bytes internally. This however gets more complicated with SSD drives which often operate in 4kbyte units, though this is surely a topic for another research.
Summary
While it may seem there are many moving parts between the database and actual disks, it’s not really all that difficult to get a proper alignment if you’re careful when configuring all of the layers. Not always however you have a fully transparent system – for example in the cloud you don’t really know if the data is properly aligned underneath: you don’t know if you should be using an offset, what stripe size and how many stripe elements. It’s easy to check if you’re aligned – run a benchmark with an offset and compare to a base, but it’s much harder to figure out proper alignment options if you are not aligned.
Now it may be interesting to see what are real life effects of misalignment, my benchmark results are in the second part.
In the first part of this article I have showed how I align IO, now I want to share results of the benchmark that I have been running to see how much benefit can we get from a proper IO alignment on a 4-disk RAID1+0 with 64k stripe element. I haven’t been running any benchmarks in a while so be careful with my results and forgiving to my mistakes ![]()
The environment
Here is the summary of the system I have been running this on (for brevity I have removed some irrelevant information):
# Aspersa System Summary Report ##############################
Platform | Linux
Release | Ubuntu 10.04.2 LTS (lucid)
Kernel | 2.6.32-31-server
Architecture | CPU = 64-bit, OS = 64-bit
# Processor ##################################################
Processors | physical = 2, cores = 12, virtual = 24, hyperthreading = yes
Speeds | 24x1600.000
Models | 24xIntel(R) Xeon(R) CPU X5650 @ 2.67GHz
Caches | 24x12288 KB
# Memory #####################################################
Total | 23.59G
...
Locator Size Speed Form Factor Type Type Detail
========= ======== ================= ============= ============= ===========
DIMM_A1 4096 MB 1333 MHz (0.8 ns) DIMM {OUT OF SPEC} Other
...
# Disk Schedulers And Queue Size #############################
sda | [deadline] 128
# RAID Controller ############################################
Controller | LSI Logic MegaRAID SAS
Model | MegaRAID SAS 8704EM2, PCIE interface, 8 ports
Cache | 128MB Memory, BBU Present
BBU | 100% Charged, Temperature 34C, isSOHGood=
VirtualDev Size RAID Level Disks SpnDpth Stripe Status Cache
========== ========= ========== ===== ======= ====== ======= =========
0(no name) 1.088 TB 1 (1-0-0) 2 2-2 64 Optimal WT, RA
PhysiclDev Type State Errors Vendor Model Size
========== ==== ======= ====== ======= ============ ===========
Hard Disk SAS Online 0/0/0 SEAGATE ST3600057SS 558.911
Hard Disk SAS Online 0/0/0 SEAGATE ST3600057SS 558.911
Hard Disk SAS Online 0/0/0 SEAGATE ST3600057SS 558.911
Hard Disk SAS Online 0/0/0 SEAGATE ST3600057SS 558.911
It says controller cache is set to write-through (WT), though in fact for every benchmark I have repeated it with (a) write-through and (b) write-back to see if write-back cache would minimize the effects of misalignment.
File system of choice was XFS. Barriers and physical disk cache was disabled. The tool I used was sysbench 0.4.10 that came with this Ubuntu system. I have run every fileio benchmark and an IO bound read-write oltp benchmark in autocommit mode.
File IO benchmark
For the FileIO benchmark, I used 64 files – 1GB, 4GB and 16GB total in size with 1, 4 and 8 threads. The operations were done in 16kB units to mimic InnoDB pages. There were couple interesting surprised I faced:
1. After I got (what I thought was) the best configuration, I added LVM on top of that and the performance improved another 20-40%. It took me a while to figure it out, but here’s what happened – for XFS file system on a raw partition I was using full partition size which was slightly over 1TB in size. When I added LVM on top however, I made the logical volume slightly below 1TB. Investigating this I found that 32-bit xfs inodes (which are used by default) have to live in the first terabyte of the device which seems to have affected the performance here (IMO that’s because of where first data extents were placed in this case). When I have mounted the partition with inode64 option however, the effect disappeared and performance without LVM was slightly better than with LVM as expected. I had to redo all of the benchmarks to get the numbers right.
2. I was running vmstat during one of the tests and my eye caught the spike in OS buffers during “prepare” phase of sysbench. I found out that sysbench would not honor –file-extra-flags during “prepare” phase and instead of having files created using direct IO they were buffered in OS cache and so writes to files were serialized until they were fully overwritten and that way flushed from OS buffers. Buffers would be flushed within first few seconds so the effects of this were marginal. Alexey Kopytov fixed this in the sysbench trunk immediately, though I didn’t want to recompile sysbench on this system so I’ve used Domas’ uncache after prepare to make sure caches were clean.
OLTP benchmark
As the goal was to compare performance with different IO alignment, not different MySQL configurations, I didn’t try out different MySQL versions or settings. Moreover, I have been running these benchmarks for a customer so I just used the setting that they would have used anyway. One thing I did change was – I have significantly reduced InnoDB buffer pool to make sure the benchmark is IO bound.
That said, benchmark was running on a Percona Server 5.0.92-87 with the following my.cnf configuration:
[mysqld] datadir=/data/mysql socket=/var/run/mysqld/mysqld.sock innodb_file_per_table = true innodb_data_file_path = ibdata1:10M:autoextend innodb_flush_log_at_trx_commit = 2 innodb_flush_method = O_DIRECT innodb_log_buffer_size = 8M innodb_buffer_pool_size = 128M innodb_log_file_size = 64M innodb_log_files_in_group = 2 innodb_read_io_threads = 8 innodb_write_io_threads = 8 innodb_io_capacity = 200 port = 3306 back_log = 50 max_connections = 2500 max_connect_errors = 10 table_cache = 2048 max_allowed_packet = 16M binlog_cache_size = 16M max_heap_table_size = 64M thread_cache_size = 32 query_cache_size = 0 tmp_table_size = 64M key_buffer_size = 8M bulk_insert_buffer_size = 8M myisam_sort_buffer_size = 8M myisam_max_sort_file_size = 10G myisam_repair_threads = 1 myisam_recover skip-grant-tables
Amount of rows used was 20M, transactions were not used (autocommit), number of threads – 1, 4, 8, 16 and 32.
Benchmark scenarios
Here’s the different settings that I have ran the same benchmark on. As I mentioned earlier, each of those were run twice – first with RAID controller cache set to Write-Through and then to Write-Back.
1. Baseline – misalignment on the partition table, no LVM and no alignment settings in the file system. This is what you would often get on RHEL5, Ubuntu 8.04 or similar “older” systems if you wouldn’t do anything with respect to IO alignment.
2. Misalignment on the partition table, but proper alignment options on the file system. This is what we get when file system tries to balance writes but is not aware that it is not aligned to the beginning of the stripe element.
3. 1M alignment in partition table but no options on the file system. You should get this on RHEL6, Ubuntu 10.04 and similar systems if you wouldn’t do anything with respect to IO alignment yourself. In this case offset is correct, but file system is unaware how to align files properly.
4. Partition table and file system properly aligned; sunit/swidth set during mkfs. No LVM at this point.
5. Partition table aligned properly; sunit/swidth set during mounting but not during mkfs. This is your best option if you have a proper alignment in partition table but you did not set alignment options in xfs when creating it and you don’t want or can’t format the file system. One thing to note however – files that were written before this was set may still be unaligned, though xfs defragmentation may be able to fix that (not verified).
6. Added LVM on top of aligned partition table, used proper file system alignment.
Benchmark results
I had a hard time thinking how it would be best to present results so it’s not too stuffed and actually interesting. I decided that instead of preparing charts for each benchmark, I’ll just describe few less interesting numbers first, then I’ll show graphs for more interesting results. Let me know if you thought this was a bad idea ![]()
File IO benchmark results
Sequential read results are expectedly the least interesting. Read-ahead kicked in immediately giving ~9’600 iops (~150MB/s) at 1 thread, 14500 iops (~230MB/s) at 4 threads and ~16300 iops (~250MB/s) at 8 threads. Neither IO alignment nor file size made any difference. Adding LVM here reduced single-thread performance by 5-10%.
Sequential write results were a bit more interesting. With WT (write-through) cache enabled, performance was really poor whatsoever and there was virtually no difference whether it was 1 thread, 4 or 8 threads. Different file sizes made no difference too. Write-back cache gave an incredible performance boost – up to 33x in single-threaded workload. File system IO alignment seems to have made a difference – up to 15% with write-back cache enabled. Here’s 1GB seqwr with WT cache:

Here’s same test with WB cache:

And just to show you the difference between sequential writes with WT cache and WB cache:

Random read. This is probably the most interesting number for OLTP workload which is usually light on writes (especially if there’s a BBU protected Write-Back cache) and heavy on random reads. Regardless of the file size, the difference between aligned and misaligned reads was the same and, WT -vs- WB cache of course showed no difference at all. Here are the results:

As you can see IO alignment makes a difference here and improves performance up to 15% in case of 8 threads running concurrently. Because the customer was running a database which was way bigger than 16G, I’ve repeated the random read (and write) benchmark with 8 threads and total size of 256G. While the number of operations per second was slightly lower, the difference was still 15% — 909 iops unaligned -vs- 1049 aligned.
Random write. This is an important metric for write intensive workloads where there’s a lot of data being modified, inserts are done to random positions (not consecutive PK causing page splits) etc. Benchmark results are fairly consistent regardless of file size, let’s look at them. First, results with WT cache:

And here’s with WB cache:

Apparently proper IO alignment in this case gives up to 23% improvement when WB cache is used. With WT cache enabled, single thread performance improvement is marginal however WB cache brings single thread random write performance close to what 8 threads can do, and IO alignment gives extra 23% in this case.
I mentioned I did single test on a larger files (same test I did for random reads) i.e. 8 thread random write benchmark on files totaling to 256GB. With WB cache enabled, I got 919 iops unaligned and 1127 iops aligned i.e. the improvement is still 23%.
OLTP benchmark results
From this benchmark, I only have two graphs to show you. First one is with RAID controller set to WT cache:

The second is with WB cache:

I couldn’t figure out what exactly happened with setting #3 when WB cache was disabled, what I do know though is that, based on IO stats I was gathering during the benchmarks, the reason was in fact lower number of IO operations and higher response time – so it seems in this case misaligned IO had some collateral effects in a mixed read/write environment. Note that the benchmarks were all scripted and oltp benchmarks would automatically start after file tests so if there was an error in the setting, it would have reflected across all other benchmarks for the same setting.
Summary
For the two workloads that are most relevant to databases – random reads and random writes – IO alignment on a 4-disk RAID10 with standard 64k stripe element size makes a significant difference. When I launched the system that I was benchmarking, I could clearly see the difference in production as I had another machine running sideways with the same hardware, but with a misaligned IO. Here’s diskstats from the two shards running side by side:
Aligned:
#ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg
{540} dm-0 447.1 34.0 7.4 0% 2.4 5.4 23.4 49.6 0.6 0% 0.0 0.6 85% 0
Misaligned:
#ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg
{925} dm-0 462.1 34.1 7.7 0% 3.8 8.2 12.1 87.0 0.5 0% 0.0 0.7 93% 0
While number of operations from the OS perspective is very similar, due to high concurrency response time in the first case is significantly better.
It would be interesting however to run similar benchmarks on a larger RAID5 system where it should make even bigger difference on writes. Another interesting setting might be a [mirrored] RAID0 with many more stripes as not having proper file system alignment should have really interesting effects. Large stripe on the other hand should somewhat reduce the effects of misalignment, though it would definitely be interesting to run benchmarks and verify that. If you have some numbers to share, please leave a comment. Next, I plan to look at IO alignment on Flash cards to see what benefits we can get there from proper alignment.
You can find scripts and plain data here on our public wiki.
#6 This post is part of the Velocity countdown series. Stay tuned for the articles to come.
Alrighty, this is something I've talked about last year at HighLoad++ and Fronteers, but never blogged. I came up with this thing while at Yahoo! Search and we used it there in production. So, it must be working. While probably not practical for many sites, the takeaway might very well be: optimizations are all around us, we just have to look around for them.
The summary: browsers have built-in search boxes which also give you search suggestions. IE8+ have added visual search suggestions, meaning images. The trick here is - when you don't have a visual suggestion, why not "suggest" an image for preloading, such as a sprite. Or two.
On to the details.
Chrome search
Every browser these days has them search boxes. Usually in the upper right. Like this:

(Aside rant: I call these browser search boxes "chrome search", but with the advent of the Chrome browser by Google and the fact that Google does search, this is now confusing, even doubly so. So next time you name a project or a company using a name that is already full of meaning such as Chrome or Closure, you're just being unimaginative and.. yeah, not cool. Unless of course it's something like Apple that's too far off from the actual product.)
Chrome search suggestions
Depending on how the search provider was set up, while you type the browser can send the partial string you typed to a URL of the search provider's choice. The search provider then can return a list of suggestions.

BTW, you don't have to be Google or Yahoo in order to be a search provider. This is "open search" and any site could (and frankly, should) be a search provider, here's an example how.
Visual chrome search suggestions
Starting with IE8, the provider can not only offer textual suggestions, but also visual ones, meaning images. So if you look up a stock symbol, it can give you a chart. Or a weather forecast.

The search provider sends back JSON or XML to the browser. The visual thing requires XML actually. Here's more from the horse's mouth. So in order to send an image back you use a syntax like so:
... <Item> <Text>Currently: Partly Cloudy, 67F</Text> <Description>High: 71F Low: 63F</Description> <Url>http://weather.yahoo.com/forecast/ USCA1024_f.html</Url> <Image source="http://path/to/my/image.png" alt="Partly Cloudy" width="31" height="31"/> </Item> ...
Preload in visual chrome search suggestions
So what if you don't have an appropriate image for a search term? Or maybe a partial term, say like for example "sch"? Why not load a sprite or some other image, hidden, 0x0 instead. Something that you can be reasonably confident the user will need once they get to the search results page. Easy-peasy:
... <Item> <Text>schmuck</Text> <Url>http://whatever.org</Url> <Image source="http://path/to/my/sprite.png" width="0" height="0"/> </Item> ...
This way you're in effect preloading something that will be needed later. I mean once the user starts typing, there's a pretty decent chance they'll end up on the results page. And they are likely to prefer a fast results page. It will be fast when pieces of it were pre-downloaded.
And if you inspect what's going on in HTTPWatch, you can see that as you type, the browser sends a request to the search provider, which then returns an XML document. And an image, although no image is seen.

What didn't work
I tried preloading CSS and JS too, but it didn't work. These show broken image icons, even with 0x0 size. I also tried 1x1 but then there's a 1x[LINE-HEIGHT] grey square/line.

And that's that
So here - a way to preload images for free, by simply using a browser feature.
You can also use this to preload 1x1 beacons for DNS resolution purposes. Say you serve search suggestions from suggest.example.org but the results are on www.example.org and the static stuff is on static1.example.org, static2.example.org, etc. There's an opportunity to pre-resolve all these domain names by requesting a tiny image from each.
Thanks for reading, this was the second post from the Velocity countdown. Only 5 more days to go! Hope you learned something, even if you don't intend to use it right away. Or maybe at least you've decided to offer your site as a search provider to all browsers. Free marketing, why not?
See you tomorrow.
Fellow Nestorelfling, Vuk, met some of the team from CASA, the Centre for Advanced Spatial Analysis, at the latest Geomob, and seeing as we've been rapping a lot on data visualisation across at the Nestoria Australia blog, he thought I might like to be the one to pitch some questions to the Director of the Centre, Andrew Hudson-Smith.
As well as being the CASA Director, Andrew is also elected Fellow of the Royal Society of Arts, the Course Founder and Director of the MRes in Advanced Spatial Analysis and Visualisation at University College London, and the man behind the Digital Urban blog, which is where you can learn more about what they're up to at CASA without enrolling as a student – the blog is smart but not overly academic.
You're the Director of the Centre for Advanced Spatial Analysis (CASA) , can you give us a basic idea of what the centre is about, what CASA is primarily interested in and what sort of projects it's been involved in?
CASA is a uniquely multi-disciplinary centre with a focus on the functions, growth and operations of cities and the urban environment as a whole. The research is wide ranging from complexity, flows and simulation through to digital sensors, tagging, augmented reality, modelling and mass data collection via crowd sourcing. The phase Smart Cities is a tag that is currently used within industry, I would describe CASA’s research as Beyond Smart Cites.
I'm also interested in what field you'd say you primarily belong to?
And what are the some of the fields the rest of your team would say they primarily belong to?
Personally I am a geographer and urban planner with a research outlook on all things digital and how digital technologies change the way we view and interact with the urban environment. CASA’s core strength is its research staff and PhD students from a wide range of fields. We are an open plan lab with a mix of mathematicians, physicists, archaeologists, architects, urban planners, computer scientists and geographers, arguably all the ingredients required to make sense, model and predict the future of the city.
New technology has made the science of mapping more accessible to everyone. What are some of the things that have come from greater user access and participation?
And do you have any predictions for what this could lead on to in the future?
The last five years have been game changing, we now take data for granted and the ability to zoom into any aspect of the earth in three dimensions, all from a location aware mobile device. Arguably this game change came from Google and the waves from Google Maps and Google Earth have radiated out to allow greater user access and participation. Digital geography is now in The Cloud with old school GIS playing catch up with citizen created mashups, community built open source applications and innovative developments from small start ups. The neogeography debate has been and gone, we are now in a era where anyone can make a map, collect, contribute and analyse data using any number of the millions of data sources online. It’s an exhilarating time to be in the field and access/participation are the key to our wider understanding of complex urban systems with the crowd playing an increasingly important role.
What are your thoughts on 'social-geo' and what inventions may spring from this pairing?
Location and more especially the link to social networks is an interesting pairing. The success of systems such as Facebook Places and Foursquare and linking social networks to location with the implications of knowing not only our thoughts and activities but also where we are. Of note is the rapid rise of these networks and developments through advances in mobile technologies and location aware smart phones linked into our natural response to be share and be social. We are looking at innovations in ‘hyper-local social location’ so its not only you checked into a general location, we can define social-geo down to object level, towards the ‘Geography of Everything’.
What other interesting things do you see happening in this sphere at the moment?
And are there any other fields you'd like to welcome into your fold at CASA? (I saw that you've embraced history in your recent project with OXFAM for example.)
Personally I feel its important to look beyond ones research comfort zone, to branch out into new fields while linking in your current knowledge to develop new lines of research. As such, CASA is now leading the way in the development of technology around the Internet of Things for adding provenance to objects both in the charity and museum based sectors. We link this back into the urban realm by allowing architecture, street furniture and places to link into the network and act as portals for communication. In essence we are developing ‘Read/Write’ places linked to location, social interaction and augmented memories. We have recently tagged 4200 bus stops in Norway, these places can now tweet, collect and replay data via technology developed in CASA. As such for the future we welcome all fields, at the moment the thought process is turning to biology and the analogy of biocomplex systems within the city infrastructure.
CASA's Analogue Tweet-o-metre, currently at the British LibraryYour team is interested in modelling landscapes and modelling pedestrian movements, so how far off are we, technologically, from modelling real life in real time?
We have a new research project going live in 2012 looking at realtime feeds and simulation of the city. Integrating realtime data with modelling and simulation is a developing field and one in which I hope CASA will lead. Increasingly it is no longer about traditional datasets but what is happening in the city right now. From the next generation smart phones through to streams from urban remote drones we are on the edge of a new wave of city information systems.
Apart from my Big Brother-esque suggestion, what are some of the innovations your field is striving towards?
We are striving for a realtime sim-city with data and simulation running side by side. Tagging, tracking and collection of data by autonomous urban drones presents a big-brother view of the world. Yet the understanding we gain via a city and indeed global simulator with live data offering the potential to open up our understanding of urban phenomena. Keep an eye on CASA over the coming 12 months as this field starts to develop.
And how can someone interested in getting involved in these sorts of projects (and modes of world domination) best go about it.
There are many ways, ranging from enrolling in our new MRes course on Spatial Analysis and Visualisation, looking into a PhD in CASA or simply following our blogs and tutorials online. We publish our work via the CASA Blog network on a daily basis with a mix of thoughts, progress and tutorials, to allow anyone to gain an insight into our work. Its an online world with tutorials abound, if your interested in advanced spatial analysis from sensing, modeling, mashing, mapping and augmenting then now is the time as the work is just getting interesting….
NDB cluster is a very interesting solution in term of high availability since there are no single point of failure. In an environment like EC2, where a node can disappear almost without notice, one would think that it is a good fit.
It is indeed a good fit but reality is a bit trickier. The main issue we faced is that IPs are dynamic in EC2 so when an instance restarts, it gets a new IP. What the problem with a new IP? Just change the IP in the cluster config and perform an rolling restart! no? In fact this will not work, since the cluster is already in degraded mode, restarting the surviving node of the degraded node group (NoOfReplicas=2) will cause the NDB cluster to shutdown.
This can be solved by using host names instead of IPs in the config.ini file. What needs to be done is to define, in /etc/hosts, on entry per cluster member. The API nodes are not required. Here is an example:
$ more /etc/hosts 127.0.0.1 localhost.localdomain localhost 10.11.11.11 mgmn1 10.22.22.22 mgmn2 10.33.33.33 data1 10.44.44.44 data2
the file will be present and identical, at least for the NDB part, in all hosts. Next the NDB configuration must use the hostname like:
[NDBD DEFAULT] NoOfReplicas=2 Datadir=/var/lib/mysql-cluster/ DataMemory=1G IndexMemory=100M [NDB_MGMD] Id=1 Hostname=mgmn1 [NDB_MGMD] Id=2 Hostname=mgmn2 [NDBD] Id=3 Hostname=data1 [NDBD] Id=4 Hostname=data2 [MYSQLD] [MYSQLD] [MYSQLD] [MYSQLD] [MYSQLD]
This is, of course, a very minimalistic configuration but I am sure you get the point. Let’s go back to our original problem and consider that data2 went down. You spin up a new host, configure it with NDB. Consider, for example, that the IP of the new host is 10.55.55.55. The first thing to do is to update all the /etc/hosts files so that they look like:
$ more /etc/hosts 127.0.0.1 localhost.localdomain localhost 10.11.11.11 mgmn1 10.22.22.22 mgmn2 10.33.33.33 data1 10.55.55.55 data2
And now, you start the ndbd process (or ndbmtd) on data2. Since the management nodes read the /etc/hosts file before the change to get the IP from which to expect the connection, you’ll get the “No free node” error. But then, the management node, when looping back in its connection handling code, will read the new /etc/hosts file and the second time, the connection will succeed.
Tonight I got on your train at 18:15 from Waterloo, and I got off your train at 22.59:45 at Woking after the following:
1. Total chaos at Waterloo with delayed trains and no explanation whatsoever
2. Finally getting on a train after failing to get on two previous ones as people were literally hanging out of the doors
3. Train rode well to Clapham then crept towards Woking – no explanation
4. Train stops
==== Guard does the following announcements in a variety of orders and not very often and very reluctantly =====
Apologies for the delays tonight
Apologies for the delays tonight it is because of signal failures
Apologies for the delays it is because of a fire in Farnham
Sorry for no information, I can’t get through on the telephone for an update
The managers say that you have to stay on the train
The engineers are fixing the signals
We have too many trains in the station
There are no trains going in or out of Woking – (except that one)
Then – after 2 hours of being stuck on your train, the lights and air go off. No explanation. After ten minutes, the guard says that he is going to open the central doors to help air movement – he doesn’t. He also mentions that he has a drink and a sandwich, but does not offer to share – we have no food or water, people start getting ill.
==== the air is off for well over an hour – that’s the air conditioning that enables a packed train to breathe – imagine that =====
The announcements start up, and this time the driver joins in – once – (note he didn’t when it was the signal failure being blamed)
We are told:
- If we get off the train we will be arrested
- That police are walking the track and waiting at Woking to arrest anyone who attempts to leave the train – we see no sign
- That *passengers* have broken out of the trains and are roaming the tracks therefore we can’t move – I see no passengers, or indeed any activity outside
- That the manager cannot organise a manual evacuation as there are not enough staff (sorry I pay *how much*?)
- That the reason we have no light or air is because of these (alleged) passengers on the track – again, no one I was tweeting (and there were many), saw any passengers leave any train or on any track
We were told the above in such authoritative and scary ways that we were all too frightened to ask if a door could be opened so we could breathe. By this point we have been stuck in your train for 4 hours and we have had no light or air for 2 – TWO HOURS… and FIVE HOURS in total on that damned carriage.
Incidentally, as the guard was telling us how Woking station was swarming with policemen waiting to arrest us, and that were it not for our fellow passengers escaping from these trains of hell we would all be home and dry. So a fellow passenger who was on the phone to a friend waiting at Woking station told us that there was no train movement in Woking, therefore we were not going anywhere anyway, and that the one van load of police who had turned up, left an hour beforehand and there was nothing other than concerned people waiting to collect people from the trains – friends and family, not your staff of course, they were not there.
We considered briefly calling the fire brigade – or Dominos – but Dunkirk spirit prevailed and we were all terribly English and sympathetic with the “poor” guard who was hogging his sandwiches and water and threatening us with the police – I have to say I was not so sympathetic, I thought he needed to go on a special course for awful people to use in emergencies.
At 22.59:45 we crept into Woking station, somehow the train staff had managed to make us all feel cowed and responsible for the whole debacle because some of our peers had allegedly broken out of a train – and quite frankly who could have blamed them – although I don’t believe it for a minute – I was there and I saw no one – nor any police or arrests – total blarney.
Therefore no one expressed any surprise, that after being trapped in your airless, oppressive, food and water free train (that we paid to go on) that you guys had not bothered, in the 5 hours we were all stranded on the tracks between Waterloo and Woking – to lay on some staff, food, drink, forms – when we piled out at 11pm, feeling a bit guilty for the trouble we had caused, in Woking, even though I and all my fellow passengers and fellow tweeters had done nothing more than swear at you a bit on twitter, no one had ACTUALLY left the train… no one and I don’t believe they ever did. But IF they did, who could blame them quite frankly.
This authoritative and shouty voice that was relayed to us as “the SWT managers say you must not leave the train or we will arrest you and we have police patrolling the tracks…” etc which tried to keep us in our place, did not have the courage to meet us off these trains of hell and try to at least make sure we had a glass of water. NOT ONE of you bothered – and even the station staff had fled, just leaving the gates open – not ONE official was on Woking station, not one that I saw. Where was the help and apology?
So I suggest that you do the following:
1. Install emergency water and food on every carriage
2. Have a phone on every carriage that in such emergencies people can use to call loved ones/get on the Internet
3. Train your staff to explain always what is happening and why we have to stay in carriages
4. Have a way to ensure that there is light and air in an emergency
5. Ensure that if ever a train is stuck for longer than an hour, that there is a crack team to help at the first receiving station
6. Have a manual signalling system to take over from faulty signals
7. An arrangement with the local fire brigade to evacuate passengers held for longer than an hour in a safe way
8. If indeed this was all caused by a cable theft bid that we are told this from the start, that you don’t run trains when there has been such a cable theft – although I am not totally believing this convenient story
One point I must make, is that the person or persons manning the twitter feed on @NRE_SWT was calm, informative, personal and not threatening. This was a job well done by those manning twitter and they should be commended. This was until my phone ran out after 3.5 hours on your train of hell.
*update* I am told by a fellow passenger, who is heavily pregnant, that she did get off the train but after she was told that the power was off on the track – whch would mean a decent amount of time after we had been trapped in the airless, light-free carriages <- a convenient excuse for SWT to hold onto, in my opinion. What caused the panic amongst passengers that they would risk their lives and those of their unborn children? This is panic, not rationale, and cannot be blamed for what had happened beforehand
Percona is glad to announce the release of Percona Server 5.5.12-20.3 on June 9, 2011 (Downloads are available here and from the Percona Software Repositories).
Based on MySQL 5.5.12, Percona Server 5.5.12-20.3 is now the current stable release in the 5.5 series.
Other Changes
-
The list of authors of the plugins used have been corrected. Bug Fixes:#723050 (Y. Kinoshita)
For more information, please see the following links:
-
Downloads: Binary distribution, Percona Software Repositories
ERRATA: Due to misconceptions, bugs already fixed in the past release (5.5.11-20.2) were included in the Release Notes of this version. They have been removed for consistency, I apologize for any inconvenience.
After the success of Percona Live in New York, we have published in our website the presentation slides of talks from the leading experts in MySQL. Check it out.
#5 This post is part of the Velocity countdown series. Stay tuned for the articles to come.

It's been over a year, since the launch of perfplanet.com. Looks good and useful for people so far. Sergey "ShowSlow" is doing also a great job of tweeting as @perfplanet about news from the perfplanet pipes as well as other interesting happenings in our perf community. Good stuff.
From the beginning I was sure I'm not including all blogs that deserve it. And new ones come up. So I said - send me an email, I'll look around the blog and add it to the planetarium. Problem with that is that the process was cumbersome. I don't always have the time (or am just being lazy because the process is kinda involved). Namely - update a yahoo pipe and update an html page with the list of blogs.
So I decided to take a few hours tonight to remedy the situation. I thought long and hard for what must have been a whole minute and the solution that came was - GitHub.
On GitHub
All the code is now updated, there's a bit of build process, minification and such and the code is now on GitHub, yeey. So if you send a pull request, I just accept it, run the build script and update the site.
Contributing
There is this "planetarium.json" file:
https://github.com/stoyan/perfplanet/blob/master/tools/planetarium.json
All you have to do is update this file, add your (or your favorite) blog for syndication and that's that. Or delete a spammy or irrelevant blog.
When adding a feed URL, try to find the "performance" related category. Because, sadly, not everyone is all that interested in other people's cats, as they are in other people's performance thoughts.
E.g. Ben Cherry's feed is:
http://feeds.feedburner.com/adequatelygood/
But we're only interested in the posts tagged "performance":
http://feeds.feedburner.com/adequatelygood/performance
There are exceptions, of course, some folks only talk about performance.
i18n
About internationalizations - talk to me. There is currently an fr.perfplanet.com, not necessarily maintained. But if you want to aggregate blogs in your language, you can just clone the github project, maintain your blogs and I'll setup your-lang.perfplanet.com and start pulling updates from github.
Build script
So first, I switched from Yahoo Pipes to YQL. Because the aggregation request can be generated from a list of URLs, no need to use a UI, login into Pipes, etc.
Other than that, I added a build script (in JavaScript, yes!) that does this:
- it takes the JSON list of blogs, an HTML template, CSS and JS
- generates index.html with inline minified CSS and JS using cssmin.js and jsmin.js. Gotta minify, gotta save requests
- also in the index.html there's a list of blogger names and URLs generated from the JSON
- generates an up.sh ("up" as in update) - this is a curl call with generated YQL query. This file is executed by a cron job every hour to read new blog posts and write data.js
- data.js is then used in index.html to display the content
Thanks!
So that's that. Hopefully this way the site will see much more updates with new fresh content and blogs.
Bugs, etc, welcome.
Contributions to the list of blogs (or anything else really) more than welcome.
Once again - this is the site and this is its GitHub.
#4 This post is part of the Velocity countdown series. Stay tuned for the articles to come.
I'm working on tomorrow's kind of big thing, so will take it easy today, with a stroll down memory lane.
I was clearing up my space at home few days ago and came across this oldish notepad. In there (among the usual amount of lists of todos and ideas in the spirit of i-wanna-do-this-tool/site/experiment!) I found these early sketches of what has since become YSlow 2.0. These are all still pretty relevant, so why not take a minute to review them and get acquainted with the YSlow internals.
Back at the time Steve Souders and I had just released YSlow 0.9 and Steve had moved to Google. It was the right time to have a quick bugfix-or-two release of YSlow 1.0 and in parallel get cranking on a complete YSlow 2.0 rewrite.
The motivation behind the total rewrite was (aside from the usual "I didn't make this mess" ego-driven desire to start fresh and do a better job the second time around) was that we were getting a lot of "meh, these are Yahoo's problems/rules, not yours". For example a normal mere mortal blog with no CDN budget should still try to get an A in most other checks. Another, somewhat forward thinking, as opposed to reactive reason was that I was a big fan of letting others contribute rules and checks of their own. The idea was to make YSlow your own tool, not only Yahoo's. For example if you want to set a rule that there should be no more than 5 images on a page, you should be able to codify this into a rule. And share the rule with the rest of team or the world. (Here's an example). Another thing was also to decouple the tool from Firebug. Make it work without Firebug and even without Firefox. Go back to having a bookmarklet version (Steve's original very first version) and versions for other browsers. (Thanks to Marcel Duran this is also becoming a reality now)
So the new architecture (a big name for a bunch of objects) was conceived on these sketches while en route a red-eye flight to Bulgaria. My little kids were asleep taking over my seat as well, so here I was standing up in the aisle on the plane or sitting on the seat's handrail, scribbling these notes.
The main idea was divide and conquer. Split this monolithic piece of code into smaller components.
When you run YSlow, it starts by "peeling off" the page, extracting all possible information. Hence the Peeler singleton.

Peeler has methods such as getJavaScript() and getDocuments() (as in document + any frames). This can work most anywhere (bookmarklet too). Then if YSlow is running inside Firebug and has access to Net Panel (or any other browser or environment that lets you access stuff happening on the network, not only DOM crawling), it can also find things such as XHR requests or image beacons, which are not part of the DOM, using a NetMonitor listener object of some sorts.
Whatever Peeler finds, it sticks into a ComponentSet which is just an array of components along with some convenience methods such as getComponentsByType('css').
Moving on, the ComponentSet contains Component objects which have all the data, like headers, type, content, URL, the whole thing.

K, now we have a bunch of components waiting and willing to be inspected. To make this inspection as lego-like as possible, there's no big-ass inspector, but there are many little Rule objects. Each Rule object has a bunch of properties like name, URL with more info, etc, but the main thing is - it needs to implement a lint() method. The lint() method takes a reference to the ComponentSet and then returns a Result object.
The Result objects are fairly simple - they have a grade/score, message and optionally a list of offending components (e.g. images without Expires header). A bunch of result objects make a ResultSet which has methods to get the final total score.

A bunch of Rule objects go into a RuleSet. The idea is to mash those up as you wish. So a Rule object is for example "Use CDN". (it's also configurable, e.g. how many score points to take away for each offender). Also within a RuleSet you can define what is the relative weight of each Rule. E.g. is F on "Expires" rule as bad as F on "CSS expressions". You can create your own RuleSets (e.g. "Small blog") including an configuring any of the existing rules you like and also add more custom Rules. It's one big happy pool of Rules to pick from and configure. In fact YSlow 2.0 shipped with three rulesets - the new one with more rules, the old yslow1 and a "small site or blog"
At the end there is one central lint() method which takes a RuleSet, loops over the Rules in it, calls each Rule's lint() and collects the results into a ResultSet.

From there it's a question of rendering the ResultSet, grades, offenders, etc. Additionally there are tools that can run on the ComponentSet (e.g. JSLint) and stats. In addition to the YSlow UI, you should be able to render these results in any way you like, including exporting a JSON or whathaveyou.

Whew!
I may have missed some details but that's about all there is to the core of YSlow 2.0
Here's also a presentation that talks about these things and offers some diagrams that hopefully clarify even further
Thanks for reading!
That was it for today, only 4 days to go to Velocity. Hope you learned something you can use and you're ready to start coding your own rules and create rulesets to customize what YSlow can do for you.
To stay connected, there's now a Facebook page for YSlow and there's always the YDN (Yahoo! Developer Network) section about YSlow
#3 This post is part of the Velocity countdown series. Stay tuned for the articles to come.
Without further ado, please point your browser to the newborn bookofspeed.com.
It's a free (public domain), online, open-source, not yet finished, book about web performance.
Contributions welcome
The source files are on Github - https://github.com/stoyan/Book-of-Speed. I'll be glad to receive any errata, technical mistakes, requests, grammar checks, anything really. Just edit the stuff in /src and send a patch. /src is the text for the chapters alone, then what you see on the site and in the main directory - TOC and chapters - are generated by a build script (of course a javascript).
How did we end up here
Year and half ago I did this Performance advent calendar experiment (since moved to a new home), writing an article a day for 24 days (sounds vaguely familiar?). PeachPit press approached me about publishing a book based on those. PeachPit publishes mostly web design books (like Designing with Web Standards) and I thought designers should know about performance. Also business folks, product managers. So why not write something more accessible and less technical?
"Speed Matters" was the title.

Fast forward... I kept missing deadlines (a favorite thing, ask Douglas Adams) until eventually after 5 and a half chapters out of 9, the publisher decided to cancel the project. Fair enough. Wasn't meant to be. We're grown ups, no hard feelings. (Well, I did try to save the project by suggesting Marcel Duran who now works on YSlow to finish it, to which PeachPit expressed interest initially but then didn't bother to follow up with a comment or explanation)
So instead of letting PeachPit keep the content and maybe publish it on their site, I decided to keep the chapters and return them the money for the royalty advance they have given me. After all, I did wanted to try self-publishing for some time .
Fast forward again... I didn't do anything further. Changing computers, failing disks and non-existing backups convinced me I should let this content free sooner. "Information wants to be free". So I managed to restore from emails (but not the images, had to copy images from Word) and thought the Velocity countdown is a good excuse to release this thing.
I mentioned to my good friend and designer Yavor about the project two days ago, he had a few free cycles and sent me a mock. Awesome! the only "brief" I gave him was "it's to be a free online book, like diveintohtml5.com and eloquentjavascript.net". And here's what he came up with, how cool is that! (oh and I gave him a turtle drawing, see below)

(As you can see, he's so humble he doesn't want any credit on the site. But this is my blog and I can give credit as much as want now, can't I?
)
So last night between writing last night's post and today, I turned this mock into HTML (not fully complete, missing ego-header and pagination) and converted the 5 chapters I have so far from word docs to HTML.
Audience
If you follow my blog there isn't much new for you. Like I said, the audience was to be less technical. But there are a few new never-before seen bits and pieces.
Assuming the html-writing part of the PeachPit audience will be still very attached to XHTML, I decided to do what I generally tend to avoid - closing tags, using type="text/javascript" etc. Further edits should convert these to more compact html5-allowed syntax.
In the markup for the site though, in the rush to convert everything I started not closing P and LI to save time
Feel free to send a patch.
No credits
I was planning on having one round of credit-giving either as footnotes or appendix once the book is done. But the books is not done, so forgive me if I havent given you credit where it was due.
No links
It's silly to have no links in an online publication, but given the rush, I didn't edit the content at all to add them. Again I was planning on appendix, or actually a companion site. Will do. Will accept a patch ![]()
On editing
My editor from PeachPit sent me notes and edits. These are not in the online edition. Partly because I don't think it's fair (what's in it for them?) and partly because, trivially, I didn't have the time.
On reviewing
I got technical reviews from Marcel Duran and Sergey Chikuyonok while working on the book. I haven't incorporated their feedback. Will do
(Sergey said my chapter on image optimization was too basic
It is, especially compared with his articles on smashing magazine and his blog )
But Annie Sullivan from Google went way above and beyond any review I have seen. She actually read the chapter with her husband (not technical) and explained to him what's going on. So I had very eye-opening observations and I'm grateful and indebted for this.
(As you guessed, the feedback is not yet reflected in the text)
PageSpeed
PageSpeed runs on Dreamhost where the site is. So I though I should check the "use pagespeed" check in DH's panel. Not bad, not bad at all. Having your images and other stuff taken care of for you automagically. I have 99/100 Page Speed score and 94/100 YSlow.
I do minify CSS myself though and inline it, because it's small
Turtle
I couldn't use the turtle (nor the title) from Speed Matters. But my kid drew a turtle in drawing class so I thought I should use it. Here's what it looked like before the my designer friend took over:

Happy reading!
Like I metioned, regulars on this blog won't find much new information, but feel free to send your junior team members to learn from the free source.
And don't forget to send patches - book editing via GitHub sounds pretty nice to me.
Once again - here's the book of speed and here are its source files.
#2 This post is part of the Velocity countdown series. Stay tuned for the last one tomorrow.

With only 2 days to Velocity, it's time to drop in the quality of these posts (but the one tomorrow will be great, I promise) with today's announcement of the immediate availability of the project called http://sultansofspeed.com.
I think we've had enough of experts, gurus, ninjas, jedis, pirates and overloards. Time for the sultans to step in!
So there: a slideshow of bios and photos of a number of Web Performance Sultans.
The background music is my heavy metal cover (sorry!) of "Sultans of Swing" by Dire Straits.
The Sultans you see there are the people who have written for the Perfplanet Calendar. But this is just the initial seed. (And because these are the bios/photos I have easy access to.)
Are you a sultan? Add/delete/edit your bio in the Github repo in the sultans.js file.
Want to change something - better slideshow maybe? Yes, the repository is still there.
In the immortal words of Mark Knopfler:
And he makes it fast with one more thing:
"We are the sultans, yeah the sultans of speed"
Another video about social media, but for some people the message does need to be rammed home.
I wrote about Galera about 1.5 years ago: State of the art: Galera – synchronous replication for InnoDB. It was about the 0.7 release, which was more like a proof-of-concept release (though Galera’s developers may not agree with that
) with some serious limitations (like using mysqldump for node propagation). The Galera team heard my suggestions and the new 0.8 release looks very promising. Well, it took 1.5 years to fix the limitations and come up with new features, but there is nothing to complain about it – a synchronous distributed transactional system is not an easy problem to solve, trust me.
So Galera 0.8 comes with many nice features:
-
Works with MySQL 5.1 and MariaDB 5.1. The latest is more interesting for us, as it is based on XtraDB. That means Galera supports the XtraDB storage engine.
-
Support for multi-threaded slaves.
-
Using custom scripts for node propagation.
RSYNCmethod comes with the Galera distribution, and it is quite easy to add support for Percona XtraBackup to propagate nodes.
Why MySQL/MariaDB + Galera 0.8 is interesting? It allows to solve following problems:
-
A Real High Availability solution for systems based on InnoDB/XtraDB. The recommended setup is 3 nodes and you can add / remove nodes almost transparently.
-
It is possible to use it in traditional master-slave setups, but with big difference – with
Galerawe have Semi-synchronous slaves. How it works on slaves: slaves just acknowledge reception of a network packet (not a transaction). Transaction is not guaranteed to be applied – what is guaranteed is that every node will do exactly the same thing with it. With parallel applying on slaves, the latency of round-trip transactions should be in an acceptable range. -
It opens the possibility for active master – active master setups. You can write on both masters and do not worry about conflict resolution and get rid of those “slave is out of sync with master” pain-in-the-neck problems.
-
Combining all above, now we can setup distributed replication systems with masters in different data centers. This provides a HA solution for MySQL setups in the Cloud. E.g. with current state of MySQL EC2 setups suffer from lacking a good HA schema. With Galera, we can setup replication with the same availability and in different zones. Remember the recent and famous EC2 outage? Having masters in USA and Europe regions would solve this kind of problems.
-
Scaling Writes. From benchmarks provided by the Galera team we may see a good scaling of throughput writing to several nodes
As you may see, I am pretty excited that this solution is available for MySQL users as Free / Open Source Software.
“Where is the catch?”, you may ask. Yes, there are couple of points to consider:
-
Complexity of setup: I have been playing with Galera 0.8-pre for couple last weeks, and from my observation, the task to setup three node cluster is much more complex than an average MySQL master-slave setup.
-
Potential performance penalty: Although to have performance numbers I am going to run different benchmarks, I expect that for simple master-slave setups the response time and throughput will be affected (and not towards better side). But this is price to pay for synchronous not-getting-behind slaves. With additional nodes, the response time will only increase. And in multi-nodes setups, the performance of whole cluster will be defined by slowest server, so it will be recommended to have uniform servers across the cluster.
From my experiments with MariaDB/Galera 0.8 I have one serious feature request for the Galera team: provide the ability for incremental node provisioning.
By this I mean that if right now a node gets disconnected from the cluster, in order to join to the cluster again, it has to copy the whole data set again. But if it was disconnected only for a short period of time, we may want to copy only changes during this period. I believe that, with integration with Percona XtraBackup and its incremental backup features, it is possible to have incremental node provisioning.
To finalize this post, let me invite you to join me in testing MySQL/Galera 0.8, binaries are available from Launchpad.
The downside of MySQL/Galera is that it is based on standard InnoDB 5.1 instead of InnoDB-plugin. Standard InnoDB is seriously behind InnoDB-plugin in terms of features and performance. An answer to this is using MariaDB/Galera, which based on XtraDB, but is is only available in source code, and you may need system based on RedHat 6 or similar to have it compiled. There is also a helpful Wiki page with a bunch of information about Galera replication.
#1 This guest post from Billy Hoffman is the last post in the Velocity countdown series. Velocity starts first thing tomorrow! Hope you enjoyed the ride and please welcome Billy Hoffman!
Billy Hoffman (@zoompf) is the founder and CEO of Zoompf, a web performance startup whose scanning technology helps website owners find and fix performance issues which are slowing down their sites. Previously Billy was a web security researcher at SPI Dynamics and managed a research team at HP. He can open a Coke can without using his hands.
(tl;dr: Images make up the majority of the Internet, yet we consistently fail to apply the most basic of optimizations. Even big sites like Twitter are completely screwing this up. Furthermore, there are huge unexplored areas when it comes to image optimization which would provide significant savings. We should stop worrying about esoteric performance optimizations when there is so much other low hanging fruit.)
Images constitute the bulk of content on the Internet, both in terms of content size and number of resources. Using data from the wonderful HTTP Archive we see that 60% of the bytes that make up an Alexa Top 1000 website are images. The average webpage references 81 external resources, and 64% of these are images. And the dominance of images is growing. In the last 6 months, total page content size increased by 70 kB. 75% of that increase (52 kB) came from images.



We know that lossless image optimization tools reduce content size anywhere from 5-20%. Occasionally you will see 70% savings or more, but that only happens when the image contains an embedded thumbnail. That level of savings doesn't sound all that impressive. After all HTTP compression can save 60-70% on real world text resources like HTML, JavaScript, or CSS. However text resources only make up on average 188 kB, or 24% of total content size. Saving 66% on 24% of content saves about as much as 5-20% savings on 60% of the content. In fact, if you could reduce images by 25%, that would have more of an effect on reducing total content size than using HTTP compression!
If you work in front-end performance, none of this should be a surprise. Obviously any front-end performance strategy needs to include image optimizations. Image optimization is an old topic. Shouldn't we instead be focusing on more esoteric optimizations, like refactoring CSS rules so that external fonts render faster on Blackberry Webkit? No, we shouldn't, because sadly we collectively suck at optimizing images.
Give PNG a Chance? Nope.
One of the most basic image optimizations that you can make is converting GIFs to PNGs. PNGs can do everything that GIFs can do and more, and the browser issues with PNGs are larger a problem of the past. Even without applying additional lossless optimization tools on the PNG, converting a GIF file will almost without exception create a smaller PNG. This is because the fundamental way graphics data is compressed in a file PNG, using the DEFLATE algorithm, is more efficient than GIF's LZW compression scheme. Once you apply lossless tools on the converted PNG they get even smaller. Animated GIFs are the exception here, as PNGs are not animated and alternatives for simple animations (MNG, Flash) are either not widely supported or result in larger files. So what is the break down of image formats on the web today?

37% of images on the Alexa Top 1000 websites are GIFs. That makes no sense given what we know about PNGs over GIFs. 37% of the images on the Internet are not animated "Under construction" icons or Ajax status thumper animations. People are not being intelligent about file formats they use for images.
The Internet, now with more bloat!
Applying lossless image optimization tools is one of the simplest optimizations to do. Take an image, run a program, get an optimized image. Stoyan and I love optimizations like this because they are so easy to automate. Just add a step to the website build process or to your staging-to-production publishing process that automatically optimizes images. It should be transparent, something you setup once and forget about. So how are we doing?
82% of Alexa Top 1000 websites contain images which were not losslessly optimized. Apply lossless optimizations across all the images from the Alexa top 1000 would reduce file size by an additional 15%.
Surely there are just a few number of smaller sites which aren't properly optimizing images which are pulling down the statistics right? Sadly no. Twitter, the ninth largest website in the world by traffic, doesn't losslessly optimize any of their images. 33% of total page load bytes could be eliminated solely by applying lossless image optimization. Let me phrase that a different way: 1 byte of our every 3 bytes Twitter sends you is unnecessary! This is an incredible waste.
Unplowed Fields
It's clear we are not applying the image optimizations we already know about. However there is much more work to be done with images. This is a largly unresearched or unadvocated area which needs more attention.
Consider choosing the correct image format. Are people saving images as a PNG when they should be saved as a JPEG? Indeed they are. Tumblr's background image is a 76 kB PNG image and it would be 33 kB (55% smaller) if it was a JPEG. This is better than their old 827 kB PNG background image, which would be 47 kB (94% smaller) if it was a JPEG. Unfortunately I know of no other tool besides Zoompf's free performance scan which identifies PNG candidate images for conversion to JPEG.
What about JPEGs saved with a high quality setting? This is a large enough topic for its own blog post. To quickly summarize, JPEG "quality" is an arbitrary, non-linear scale, quality is not a percentage of anything, and "quality of 80" does not mean "discard 20% of graphics data." Thought leaders like Adobe recommend a quality setting of 70-80 for JPEGs published on the web. Zoompf found that 36% of Alexa Top 1000 images have a quality setting over 80, and reducing them to quality 70 would on average reduce image size by 48%! While all of these images might not be able to be reduced in quality, surely some of them can. Again, this is an area that needs more attention, more best practices and guidance, and more tools to help validate.
Not "Instead of" but "In addition too"
I am not saying other performance optimizations are not important. Zoompf checks for over 380 performance issues and we are adding more all the time. Many of them are esoteric and low impact. We flag things like duplicate cookies, unnecessary HTTP headers, and even when your <META> contains duplicate keywords. However these checks are for when you have handled all the other important checks. Image optimizations, and research into new image optimization techniques should be not done instead of other work, but in addition to it. Just remeber to prioritize what you are working on so that it will affect the most number of people in the largest possible way.
Conclusions
Images are a huge component of the web and modern web performance. This importance is only growing. Sadly, there are only one or two widely recognized image optimizations techniques. Unfortunately, these most basic optimizations are ignored, forgotten, and not uniformly applied by even the largest of websites today. Additionally, there are a lot of unexplored areas of image optimization, including lossy image optimization, with no clear recommendations or best practices and virtually no tool support. Some areas for further research include:
- Lossy image optimizations
- Comparison of JPEG encoders
- PNG-to-JPEG and GIF-to-JPEG best practices, recommendations, and processes
- Image quality for Desktop vs. mobile browsing experiences
- Better PNG24 to PNG8 conversion guidelines. (I converted all the figures in this blog post from PNG24 to PNG8 and reduced file size by 52%)
- Viablility of WebP and automated delivery to supported browsers
I will be discussing many of these topics this week during my presentation Take it all Off! Lossy Image Optimization at Velocity 2011 on Wednesday. I hope you all can make it.
Today’s guest blog comes from DirectBlinds.co.uk
Over the last 3 years the UK has been in economic turmoil, causing the UK’s property market to fall in to disarray. The major problem for property buyers is that although property prices are falling, the availability of mortgages is at its lowest in a well over a decade. Buyers are being asked to save a minimum of 20% for a deposit before even being considered by major banks. The guardian has pointed out two-thirds of first time buyers will have no prospect of owning a property within five years.
The prospect of raising so much money has led to an increase in demand for rental properties, helping encourage landlords to raise their rental value as demand sores. From 2008 to 2009 rental value increased by a staggering average of 4.9% but landlords should not use this information to rest on their laurels. Landlords must remain vigilant that potential tenants are now savvier than ever when it comes to value.
If a property is not up to scratch or appear worth its value, no matter how much demand outstrips supply, tenants will not sign an agreement. More adults are now willing to stay at home or with friends in order to save their much need 20% deposit, or while they find a place that has what they need and at the right place. Research has shown that 1.6 million people aged 18 to 34 are now doing this, triple the number than in 2008.
Landlords can do several minor home improvements that can help justify their rental value. The following advice is designed to be a low cost guide to home improvements:
Kitchen Painting: Painting is cheap, easy and quick – the ultimate home improvement. The kitchen is a pivotal part of any rental negotiation because it’s the social hub of the home. Painting the kitchen will remove any sign of wear and tear, giving the impressions the kitchen is relatively new.
Installing Blinds: When a tenant moves in they are forced to purchase a lot of new furniture and smaller items so if a landlord can invest in a set of long lasting made to measure blinds then for years several lots of tenants will not have to have that extra expense. The more you provide a tenant at low cost the higher ROI you’re likely to see in rent – especially if you ask tenants what they may like.
Bathroom Grout and Sealant: Before new tenants enter a property for viewings bathroom suites and shower units should be re-grouted and resealed. A clever tenant will be aware that a newly grouted and sealed bathroom is unlikely to grow mould and will be easier to keep clean. It also helps to give the impression a bathroom has been recently updated.
The house building industry has been having ’secret’ talks with mortgage lenders according to the Times (Tuesday 31 May) and yes I actually read the hard copy.
Why these talks are ’secret’ we can only guess, but the Times has a two page spread on the issue, so perhaps not that secret after all. The story is well known, house builders say they need to build around 230,000 new homes per year and at the moment they are only building around 100,000. This they say is largely down the banks making it nigh on impossible to get a first time mortgage.
So a pow wow ensues and we are told that there are positive signs from the meeting. Call me an old cynic, but if the banks won’t listen to government or the public at large, why will they listen to house builders? Possibly because being of the greed culture the house builders have dangled some carrots. One suggestion reported was that house builders would inject equity into a central fund that which would be used by banks to underwrite mortgages for larger amounts, such as 95%.
Whether such a fund could work is debatable. It would of course depend upon which house builders came to the party. Securing agreement with many different contributors would, I suspect, be very difficult. The Times rightly points out that not all are in favour of allowing 95% mortgages in any event and thus banks may actually not be that keen when broaching the subject with their economists. The think tank, the Institute for Public Policy Research has indicated that mortgages should be capped at 90% to avoid any more bubbles.
One thing is for sure however that it is increasingly difficult for first timers to buy a property in this country. So why bother? We discussed this a few years ago and nothing much has changed other than it has got harder to buy. Seriously, why as a young person do I want to bother with buying a house? Well according to leading socioeconomists the new ‘Generation Rent’ which is being cultured is a very bad thing. Is it really?! Pension provision is put forward as an argument, which has some merit, but times have changed, property has changed and in 30 years time the world will all look very different.
I believe this is the beginning of a new property regime and how we will all live in the future. Look at many European housing models and rent, part rent, part ownership, consortium living is the norm. Is this the way we are heading, does it matter?
All I know is that I won’t be helping the little owlets with large deposits, but then how will I get them to leave the nest – this is the better question and I have a cunning plan:)
Seems only like yesterday, that I was playing around with new design ideas for the Qt documentation trying to bring it into the Y2K age. The feedback was extraordinary, the result was well accepted and the usage increased. Still, we wanted to do more.
We wanted the Qt documentation to change from plain HTML files and to evolve into a useful tool that would let you find quality content as fast and easy as possible. It should reflect more use cases relevant to you and it should let you add your special tweaks and tips on how to get the most out of Qt. Basically, out mission has been to create a tool that would make your time on reading documentation as short as possible ![]()
For us it started with project “Mimir”, which goal was to create a XML version of the documentation, and then bringing the documentation to new channels. Without going into too many details, we created a new XML generator in QDoc3 (the Qt documentation tool), outputting Dita XML. (http://dita.xml.org/) Dita XML is a standardized XML architecture especially designed for writing and publishing information on the web. The generator was finished a couple of months ago, and from that stage, all we needed was XSLT to transform the whole documentation to the desired format.
The result was an even more powerful DevNet-site. The Qt documentation on DevNet (nicknamed “Facedoc”) provide improved navigation, complete search, bookmarking, content rating, categorizing, and last but not least; commenting. The “Add note” feature at the bottom of every doc page lets you write comments with tips and code examples enriching the documentation, and the rating-system will show you the best examples written by other DevNet users. To check it out go here: http://developer.qt.nokia.com/doc/qt-4.7/
I’ve mentioned QDoc3 before; it is what we use to generate our documentation from the Qt package. Now, this tool is part of the Qt package, and used by many to generate the documentation for their Qt products, both in HTML and for the Qt Assistant tool. The good news for these users is that they can generate their documentation into the Dita XML format themselves, and so can you ![]()
You run QDoc from the command line like this:
../../bin/qdoc3 ./yourConfigFile.qdocconf
In addition, in yourConfigFile.qdocconf you set the output format as DITAXML. How? See the QDoc Manual...
You can read all about this in the QDoc manual.
So now that you know what you can do; Go mad! Make it your documentation! ![]()
The HTTP Archive provides a permanent record of web performance information. It started in October 2010 crawling 1K URLs. This was possible thanks to Pat Meenan’s help providing access to WebPagetest. A month later we increased coverage to the world’s top ~18K URLs. That was good, but the next step is 1M URLs. Today at Velocity I made two announcements that pave the way for achieving this goal.
Starting today the HTTP Archive is part of the Internet Archive. I met Brewster Kahle several years ago and have always admired the work the Internet Archive has done building a “digital library of Internet sites.” When I approached him about this merger we both saw it as an obvious fit. In addition to preserving a record of the content of these sites (via the Wayback Machine) we agreed it’s important to record how that content is built and served. It makes sense that researchers, historians, and scholars be able to find both sets of information under one roof. I’ll continue to run the HTTP Archive project.
The following companies have agreed to sponsor the work of the HTTP Archive: Google, Mozilla, New Relic, O’Reilly Media, Etsy, Strangeloop, and dynaTrace Software. In order to grow to 1M URLs we need data center space, servers, licenses, etc. Thanks to these sponsors we’ve started to build out this infrastructure and will be increasing our coverage soon.
I look forward to working with the Internet Archive on our mission of preserving a record of the Web for generations to come. If you would like to join the effort, I invite you to make a donation to the Internet Archive and contribute your coding skills to the open source project.
Percona Live – New York video recordings are now available on our website. Check it out.
I get asked constantly what my favourite app was that was built at any of the many hack days I have run through Rewired State. I am often ashamed that I struggle to answer, although there are many. This is because hack days are rarely about the prototype.
To cover briefly what a hack day is, it is:
- one or two days long (often belying the name)
- any number of developers, for me a minimum of 10 devs are needed to make it buzz a bit, but 20+ makes it exciting
- a subject, challenge, dataset (the broader the better)
- developers are given a brief of the subject or challenge at the beginning of day 1
- they code/design/engineer over the course of a free form period of around 24 hours to create prototype solutions or ideas
- they present back to their hack peers and any inquisitive viewers, as well as the sponsor, client or group who put the event together
- prizes are awarded
- beer and pizza is essential
Many people will not experience a hack day, but if you can, please do. Show and tells are usually open to anyone who wants to attend and twitter and lanyrd are quite good at curating such event information.
However, the reason for this blog post is to explain the point of a hack day, now in 2011 (it will definitely be different in a year’s time, but to chart right now).
If you take a little time to look at the above list of what a hack day is you can understand that the common question might be: yes but what did they make and what happened next?
My response to that is that you are jumping the gun.
What we do at hack days is show you the future. Here’s why.
Why do developers turn up?
Well, in the current climate: API bonkers, information overload (yes devs get that too), tablet shmablet, toy shmoy world that we live in, there needs to be a little peace, as well as a challenge. As I have explained in a previous post about developers it is up to the rest of the world not to risk developer apathy (already here IMHO), and to look at what really matters.
Developers are simply awesome and if you know one I dare you to go try your million dollar idea out on them – they will have deconstructed and reconstructed it in minutes. Tell them your *save the world* idea and they will probably risk divorce to build it for you – please don’t do this.
Developers who know hack days turn up for the buzz, the competition and to learn, mainly to learn. Those who have never been to one come for the challenge.
I have been running hack days for three years now, and one veteran of the Rewired State hack days was at this weekends’ hactivate event. He spent the weekend coding a composting app, it’s cool, you can see it and many more here. But the big thing for him was spending 1.5 hours playing with a web server, in peace, legitimately, on a Sunday (and learning). Another group (and this is usual for a hack weekend) were hack day virgins, and have adopted the amaze-balls face of pride at what they can actually build when challenged by time (hack days are ruthless) as well as taking home the contact details of the colleagues who are as talented as themselves, at other stuff.
One developer gave himself this hack weekend as a Father’s day present. To have a weekend to spend with his peers, although coding was his day job, to work on his own projects, surrounded by like-minded awesomes, fed, watered – that’s the point.
Most developers will leave a hack day with new knowledge or at least new contacts, that can lead to extending their ability to deliver the awesomeness.
It’s probably fair to say that most would not admit to being so excited by the non-coder audience blinking at what they have managed to create in a two-day period, nor the prizes showered upon them. And, from those I know, it is always the afterthought – although I am now really clever and spend my life finding flipping brilliant geek prizes that they can’t ignore
.
Which is why it is important to understand all this before you ask: what is the point of a hack day?
What’s in it for the non-coders/organisations/brands?
So, there is an immediate and very obvious benefit for anyone engaging a number greater than ten developers on your own idea/API/bit of kit, and hack days seem to be de rigueur.Is not hard to be confident that good things will come of the weekend.
But is it the list of prototypes at the end? That no good hack day host would ever be able to predict?
No, it is engagement with the development community. Gifting your idea/API/bit of kit and enabling some free time for developers to engage with and over said idea/API/bit of kit. Yes of course you will get any number of good prototypes and even working applications – but better you will get to meet a number of developers, showing off their skills and often their newly acquired ones – this is really as rare as hen’s teeth (usually because they are fully employed fulfilling other peoples’ ambitions) engaging over a dedicated period, with peers they may not have yet met, over your technology or challenge. Yes, your super-sexy next bazillion idea might come out of this – but you created the environment for that conversation, that dev-to-dev spark.
But yet…
The thing I have noted today after Hactivate is that the sponsors are actually dedicated to seeing the apps go beyond the hack day. The winning app was one built to try to address human trafficking, and it was created to make the interface so simple that anyone could take it up without needing access to anything too technical; then we could crowdsource peoples’ safety.
The judges are determined – from a human pov, not only the brand they represented – to help collate the necessary charity network information and wherewithall to make it happen. However the geeks who thought it needed to happen and were so passionate about beating human trafficking that they spent their weekend building an application to make people a little bit more safe, found it hard to adjust to the jump of someone actually taking it on and helping make it happen (within 24 hours). Possibly because they had been coding non-stop for 24 hours, presenting to Press, sponsors and co-hackers – more probably because they were not used to their ideas being taken up so strongly and immediately by the kind of brands that can really make it a reality.
Such is the magic of a hack day.
This is why I love hack days… dilemma ![]()
And so…
The point of a hack day for a developer is to be with like-minded people, work on your own stuff, learn and be celebrated; for the rest of us, it is to create the environment for magic to happen.
Maybe in the next few years they may become simply about the prototype, but I hope that day is a long way off. The point is developers, living and learning from each other in an environment that is created by you: the challenger.
Finally…
As ever, my cry is: please, do not take the piss, developers are for life not just for your *next million* or *save the world* idea. They are an asset to be cherished and nurtured and they do not necessarily always value the same things you do. It is rarely money or jobs – most developers are awash with job offers, and extra-curricular *cash* offers.
Hack days do work, right now, because everyone wins when they are run well and with consideration. But please don’t ask me what my favourite app is that was ever built at a hack day! I can’t tell you, I have no idea. I do however now know 200+ developers whom I would be able to call in a heartbeat, and know their skills, passions and talents – but I would never sell them to anyone.
Developers are a talent to be nurtured in our open data and open society world. Hack days respect this and act as breathing spaces for devs.
It is rarely about the prototype, and when it is, I will probably go buy that flower shop I have been promising myself.



![When advertisers figure this out, our only weapon will be blue sharpies and "[disputed]". When advertisers figure this out, our only weapon will be blue sharpies and "[disputed]".](http://imgs.xkcd.com/comics/citations.png)
























