Emma Boulton doesn’t let a good question about designing and using surveys as part of a project’s research activities go unanswered. Q: Have you been good this year? A: Yes|No. Think carefully. Santa knows the answer.
Our series progresses through the talks from YUIConf 2012 and takes us to the next one from Douglas Crockford. This was a lively evening keynote after the dinner at the conference. As always, keep up on the latest videos on YUI Theater and YouTube.
In this evening keynote from YUIConf 2012, the legendary JavaScript architect Douglas Crockford discusses one of the most elusive of all programming concepts: the monad. With this talk, Crockford attempts to break the long-standing Monad tutorial curse by explaining the concept and applications of monads in a way that is actually understandable to the audience.
I was asked about slides for my talks at Northeast PHP, so I figured I would post them here so folks could benefit. I gave three MySQL talks. In the list below, the talk name links to the description on the conference website, and you can get the slides by clicking the “PDF slides” links.
Are You Getting the Best Out of Your MySQL Indexes – PDF slides
Getting Rid of Scheduled Tasks Using MySQL Events – PDF slides
Better JOINs and Subqueries – PDF slides
Hope you enjoy them!
As many folks know, I do a bit of traveling, both going to conferences, and speaking at them (MySQL and others). So I have compiled a list of tips and tricks, from the basics like “do not forget to eat breakfast” to putting your business cards inside your bag. I have a list with pictures that I will add to as I think of more. I hope you enjoy this tumblr-style list of conference tips!
Today and tomorrow I am at CodeConnexx – An Open Source Technology and Life Conference. There are some great talks…the first talk this morning is Secrets and Success in the Style of GLEE – a bunch of songs and how they relate to being successful. By Jennifer Marsman of Microsoft.
Taylor Swift, “Speak Now” – be vocal, if you have an idea, do not be shy about it. In an interview, “don’t stop talking” – meaning show them your passion – but don’t force it, of course. Ask lots of questions, do not make assumptions. And if you get stuck in a problem, you can reason your way through it by talking out log.
Bonnie Raitt – “Let’s Give Them Something To Talk About”. Communicate! Let your manager know what’s going on and what you are doing. “Give them stories to tell about you” – and good ones too! Trip reports if you go on a trip, summary status e-mails, one-on-one meetings, etc.
Aretha Franklin – “Respect”. At the end of the day, diversity of opinion, education levels, backgrounds, is key to having successful business ideas. You can learn something from everyone. You can go around at a party, meet people and figure out what they are better than you at.
Frank Sinatra – “Luck Be a Lady”. At some other conference, Jennifer saw a formula for success: Success = hard work + intelligence + luck. She did not like that, because luck is a bunch of randomness, and if we work hard and are smart, we should be successful, right? But it’s completely true that luck is part of the equation. For example, in an interview. You can control working hard and you can control learning, and you will be well-positioned for when opportunity knocks.
Bette Middler – “Wind Beneath My Wings”. Role models – people that you worship from afar, perhaps stalk on Twitter, but probably won’t have a relationship with. Mentors are folks that are actively helping you grow, which you have a relationship with. You can choose a mentor to help you work on a skill or set of skills, and you can choose different people based on what they are good at. Someone who is good at MySQL might not be good at blogging or work/life balance. Jennifer challenges all of us to be role models to other people by speaking and blogging, because those folks are seen as industry experts, so you will become a role model.
Brittany Spears – “Oops I Did It Again”. The importance of making mistakes – making mistakes is good. Take big risks, because when a mistake does happen, you can learn from them. When a panel of successful tech women were asked what they would do differently, they said they would have taking more big risks.
Bill Withers – “Lean on Me”. Delegate if you need to. Ask people for help. Out of time, health and money, you can have any 2 of the 3. Young folks usually have time and health but not money. In your 30′s, you might have health and money but not time. And when you get older, you have time and money, but not health. Optimize for what you do not have. If you do not have time, then make it so you have more time – e.g. buying pre-made salad or getting a cleaning lady is a money/time tradeoff.
Journey – “Don’t Stop Believing”. Believe in yourself. Imposter Syndrome at Wikipedia. If you do not know, say you do not know something.
We have a backup server that, from time to time, gets errors when doing mysqldump backups (we do physical backups and logical backups, but the physical backups work fine). The errors look like this:
mysqldump: Couldn't execute 'SHOW FUNCTION STATUS WHERE Db = 'mozillians_org'': Out of resources when opening file '/tmp/#sql_3b63_0.MYI' (Errcode: 24) (23)
mysqldump: Error: 'Out of resources when opening file '/tmp/#sql_3b63_2.MYI' (Errcode: 24)' when trying to dump tablespaces
I tried restarting MySQL, and that helped, for a while. It helped to the point that we put in a cron job to restart MySQL every 4 hours so we would not run out of resources.
But that did not last forever. We tried restarting more frequently. We tried increasing ulimits. Again, this helped for a while, or seemed to.
When it happened again today, I decided to look around again for what other folks’ experience was. I ended up finding someone who had this problem on Windows, and what fixed it for them was changing table_cache (table_open_cache in MySQL 5.1 and higher).
Now, I am a staunch fighter for the Battle Against Any Guess. So I thought about it, and asked myself, “Does this make sense? Would changing this actually free up any resources?” and I decided to give it a try. It made sense, especially when I considered what might be happening when I rebooted or raised the ulimits – the resources were freed. I thought about it, and realized that if the resources were not tied up in the table_open_cache, that might also help.
I reduced the table_open_cache from 1024 to 200 – since the server in question is a backup server, it does not need such a large value. Well, as you can guess from the title, it worked!
You may have noticed that I stopped posting the “weekly news” from the Mozilla DB Team. After going through the Operations Report Card and applying it to DBAs in OurSQL Podcast numbers 111, 112, 114, 115 and 116, I started thinking that the updates were really more like metrics, and it would better serve my own purposes better to do the updates monthly.
The purposes of doing this type of blog post are:
0) Answering “So what does a DBA do, anyway?”
1) Answering “DBA? At Mozilla? Does Firefox have a database? Why does Mozilla have databases, and what for?”
2) Showing what the DB team does for Mozilla, so that folks will understand that “just keeping things working” is actually a lot of work. It also helps compile yearly reviews of accomplishments.
We are also starting to get some metrics information. This month we started easy – number of MySQL and Postgres machines, number of unique databases (mysql, information_schema, performance_schema and test are ignored, and duplicates, like the same database on a master and 2 slaves, are ignored), and version information.
As of today, we have 9 unique databases across 8 Postgres servers in 4 clusters, with 6 being Postgres 9.0 and 2 being Postgres 9.2 – we are currently upgrading all our Postgres databases to 9.2 and expect that by the end of December all servers will be using 9.2.
We have 427 unique databases across 98 MySQL DB machines in 20 clusters, with 3 being MySQL 5.0, 71 being MySQL 5.1 (mostly Percona’s patched 5.1), and 24 being MariaDB 5.5.
And in the last week of October and the month of November, we have:
- Documented 4 more of our Nagios checks.
- Started to upgrade Postgres databases to Postgres 9.2
- Decommissioned a legacy database cluster for Firefox themes.
- Built a new database cluster (complete with monitoring and backups) for a new Sentry implementation.
- Upgraded X machines for general operating system updating purposes and to ensure that be2net drivers are up-to-date; out-of-date drivers can (and have!) caused servers to crash. (b1-db1, b1-db2, addons1, addons2, 3,4,5,
- Upgraded MySQL 5.0 to MySQL 5.1 across X clusters and Y machines (a01, b02, b2, 7)
- Did a quarterly purge of Crash Stats data.
- Had to re-sync 6 slaves when a transaction rolled back on the master, but some of the tables modified were MyISAM. So the master had data in some tables that was out of sync with the slaves.
- Assisted in converting to use UTC timestamps in the Elmo database behind the Mozilla localization portal and the Bugzilla Anthropology Project, prompting a blog post on converting timezone-specific times in MySQL.
- Decommissioned a legacy “production generic” database cluster that had over 60 databases on it.
- Built a 5th database backup instance due to growing backup needs.
- Changed binary log format to MIXED on our JIRA installation due to JIRA requirements and an upgrade to MySQL 5.1 issuing warnings that MySQL 5.0 had not.
- Added checksums to the database cluster that runs Input and Firefox about:home snippets.
- Archived and dropped the database behind Rock Your Firefox.
- Exported Bugzilla data for a research project. Did you know if you are doing academic research, you can get a copy of Mozilla’s public Bugzilla data?
- Gave read-only database access to a developer behind the Datazilla project.
- Updated the email list for vouched Mozillians.
- Backfilled missing crash-stats data after some failed cron scripts.
- Cleared some junk data from the Datazilla databases.
- Added new custom fields to our implementation of Bugzilla for upcoming release versions: Firefox 20, Thunderbird 20, Thunderbird ESR 20 and seamonkey 217.
- Added 10 new machines to the Graphs database, and added new sets of machines for Mozilla ESR 17 and Thunderbird ESR 17.
- Gave read-only database access to the two main leads of the Air Mozilla project.
- Debugged and turned off a 10-second timeout in our load balancing pool that was causing Postgres monitors and processors to lose connection to their databases.
- Discovered that the plugins database actually does better with the query_cache turned on, and tuned its size.
- Tweaked tokudb_cache_size and innodb_buffer_pool_size on our Datazilla databases so that less swap would be used.
- Created 2 read/write accounts for 2 people to access the development database for Mozillians.
- Gave access to Datazilla databases for staging.
Wednesday, Nov 28th was my 1-year anniversary at Mozilla. Tomorrow is December! 2012 went by very quickly.
Back in September 2009 I wrote a blog post titled “Simple Puppet Module Structure” which introduced a simple approach to writing Puppet Modules. This post has been hugely popular in the community – but much has changed in Puppet since then so it is time for an updated version of that post.
As before I will show a simple module for a common scenario. Rather than considering this module a blueprint for every module out there you should instead study its design and use it as a starting point when writing your own modules. You can build on it and adapt it but the basic approach should translate well to more complex modules.
I should note that while I work for Puppet Labs I do not know if this reflect any kind of standard suggested approach by Puppet Labs – this is what I do when managing my own machines and no more.
The most important deliverables
When writing a module I have a few things I keep in mind – these are all centered around down stream users of my module and future-me trying to figure out what is going on:
- A module should have a single entry point where someone reviewing it can get an overview of it’s behavior
- Modules that have configuration should be configurable in a single way and single place
- Modules should be made up of several single-responsibility classes. As far as possible these classes should be private details hidden from the user
- For the common use cases, users should not need to know individual resource names
- For the most common use case, users should not need to provide any parameters, defaults should be used
- Modules I write should have a consistant design and behaviour
The module layout I will present below is designed so that someone who is curious about the behaviour of the module only have to look in the init.pp to see:
- All the parameters and their defaults used to configure the behaviour of the module
- Overview of the internal structure of the module by way of descriptive class names
- Relationships and notifications that exist inside the module and what classes they can notify
This design will never remove the need for documenting your modules but a clear design will guide your users in discovering the internals of your module and how they interact with it.
More important than what a module does is how accessible it is to you and others, how easy is it to understand, debug and extend.
Thinking about your module
For this post I will write a very simple module to manage NTP – it really is very simple, you should check the Forge for more complete ones.
To go from nowhere to having NTP on your machine you would have to do:
- Install the packages and any dependencies
- Write out appropriate configuration files with some environment specific values
- Start the service or services you need once the configuration files are written. Restart it if the config file change later.
There is a clear implied dependency chain here and this basic pattern applies to most pieces of software.
These 3 points basically translate to distinct groups of actions and sticking with the above principal of single function classes I will create a class for each group.
To keep things clear and obvious I will call these class install, config and service. The names don’t matter as long as they are descriptive – but you really should pick something and stick with it in all your modules.
Writing the module
I’ll show the 3 classes that does the heavy lifting here and discuss parts of them afterwards: class ntp::install { package{'ntpd': ensure => $ntp::version } } class ntp::config { $ntpservers = $ntp::ntpservers File{ owner => root, group => root, mode => 644, } file{'/etc/ntp.conf': content => template('ntp/ntp.conf.erb'); '/etc/ntp/step-tickers': content => template('ntp/step-tickers.erb'); } } class ntp::service { $ensure = $ntp::start ? {true => running, default => stopped} service{"ntp": ensure => $ensure, enable => $ntp::enable, } } Here I have 3 classes that serve a single purpose each and do not have any details like relationships, ordering or notifications in them. They roughly just do the one thing they are supposed to do. Take a look at each class and you will see they use variables like $ntp::version, $ntp::ntpservers etc. These are variables from the the main ntp class, lets take a quick look at that class: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 # == Class: ntp # # A basic module to manage NTP # # === Parameters # [*version*] # The package version to install # # [*ntpservers*] # An array of NTP servers to use on this node # # [*enable*] # Should the service be enabled during boot time? # # [*start*] # Should the service be started by Puppet class ntp( $version = "present", $ntpservers = ["1.pool.ntp.org", "2.pool.ntp.org"], $enable = true, $start = true ) { class{'ntp::install': } -> class{'ntp::config': } ~> class{'ntp::service': } -> Class["ntp"] } This is the main entry point into the module that was mentioned earlier. All the variables the module use is documented in a single place, the basic design and parts of the module is clear and you can see that the service class can be notified and the relationships between the parts. I use the new chaining features to inject the dependencies and relationships here which surfaces these important interactions between the various classes back up to the main entry class for users to see easily. All this information is immediately available in the obvious place without looking at any additional files or by being bogged down with implementation details. Line 26 here requires some extra explanation – This ensures that all the NTP member classes are applied before this main NTP class so that cases where someone say require => Class["ntp"] elsewhere they can be sure the associated tasks are completed. This is a light weight version of the Anchor Pattern. Using the module
Let’s look at how you might use this module from knowing nothing.
Ideally simply including the main entry point on a node should be enough:
include ntp
This does what you’d generally expect – installs, configures and starts the NTP service.
After looking at the init.pp you can now supply some new values for some of the parameters to tune it for your needs:
class{"ntp": ntpservers => ["ntp1.example.com", "ntp2.example.com"]}
Or you can use the new data bindings in Puppet 3 and supply new data in Hiera to override these variables by supplying data for the keys like ntp::ntpservers.
Finally if for some or other related reason you need to restart the service you know from looking at the ntp class that you can notify the ntp::service class to achieve that.
Using classes for relationships
There’s a huge thing to note here in the main ntp class. I specify all relationships and notifies on the classes and not the resources themselves.
As personal style I only mention resources by name inside a class that contains that resource – if I ever have to access a resource outside of the class that it is contained in I access the class.
I would not write:
class ntp::service { service{"ntp": require => File["/etc/ntp.conf"]} }
These are many issues with this approach that mostly come down to maintenance headaches. Here I require the ntp config file but what if a service have more than one file? Do you then list all the files? Do you later edit every class that reference these when another file gets managed?
These issues quickly multiply in a large code base. By always acting on class names and by creating many small single purpose classes as here I effectively contain these by grouping names and not individual resource names. This way any future refactoring of individual classes would not have an impact on other classes.
So the above snippet would rather be something like this:
class ntp::service { service{"ntp": require => Class["ntp::config"]} }
Here I require the containing class and not the resource. This has the effect of requiring all resources inside that class. This has the effect of isolating changes to that class and avoiding a situation where users have to worry about the internal implementation details of the other class. Along the same lines you can also notify a class – and all resources inside that class gets notified.
I only include other classes at the top ntp level and never have include statements in my classes like ntp::confg and so forth – this means when I require the class ntp::config or notify ntp::service I get just what I want and no more.
If you create big complex classes you run the risk of having refreshonly execs that relate to configuration or installation associated with services in the same class which would have disastrous consequences if you notify the wrong thing or if a user do not study your code before using it.
A consistant style of small single purpose classes named descriptively avoid these and other problems.
What we learned and further links
There is a lot to learn here and much of it is about soft issues like the value of consistency and clarity of design and thinking about your users – and your future self.
On the technical side you should learn about the effects of relationships and notifications based on containing classes and not by naming resources by name.
And we came across a number of recently added Puppet features:
- Parameterized classes
- Chaining Arrows
- Data Bindings as introduced in Puppet 3
Parameterized Classes are used to provide multiple convenient methods for supplying data to your module – defaults in the module, specifically in code, using Hiera and (not shown here) an ENC.
Chaining Arrows are used in the main class to inject the dependencies and notifications in a way that is visible without having to study each individual class.
These are important new additions to Puppet. Some new features like Parameterised classes are not quite ready for prime time imho but in Puppet 3 when combined with the data bindings a lot of the pain points have been removed.
Finally there are a number of useful things I did not mention here. Specifically you should study the Puppet Style Guide and use the Puppet Lint tool to validate your modules comply. You should consider writing tests for your modules using rspec-puppet and finally share it on the Puppet Forge.
And perhaps most importantly – do not reinvent the wheel, check the Forge first.
Les James proposes an alternative to the fully fluid grid as an approach to responsive layout challenges. Sprinkle on some Sass fairy dust and, providing you’ve been good this year, watch your creation spring to life.
Laura Kalbag beckons us in from the cold wastelands of transitional, device-rooted layouts to warm our toes at the hearth of a more systematic way of working.
Inspired by the benchmark in this post, we decided to run some NDB vs Galera benchmarks for ourselves.
We confirmed that NDB does not perform well using m1.large instances. In fact, it’s totally unacceptable - no setup should ever have a minimum latency of 220ms - so m1.large instances are not an option. Apparently the instances get CPU bound, but CPU utilization never goes above ~50%. Maybe top/vmstat can’t be trusted in this virtualized environment?
So, why not use m1.xlarge instances? This sounds like a better plan!
As in the original post, our dataset is 15 tables of 2M rows each, created with:
./sysbench --test=tests/db/oltp.lua --oltp-tables-count=15 --oltp-table-size=2000000 --mysql-table-engine=ndbcluster --mysql-user=user --mysql-host=host1 prepare
Benchmark against NDB was executed with:
for i in 8 16 32 64 128 256
do
./sysbench --report-interval=30 --test=tests/db/oltp.lua --oltp-tables-count=15 --oltp-table-size=2000000 --rand-init=on --oltp-read-only=off --rand-type=uniform --max-requests=0 --mysql-user=user --mysql-port=3306 --mysql-host=host1,host2 --mysql-table-engine=ndbcluster --max-time=600 --num-threads=$i run > ndb_2_nodes_$i.txt
done
After we shutdown NDB, we started Galera and recreated the table, but found that running sysbench was failing. A suggestion from Hingo was to use --oltp-auto-inc=off, which worked.
Our benchmark against NDB was executed with:
for i in 8 16 32 64 128 256
do
./sysbench --report-interval=30 --test=tests/db/oltp.lua --oltp-tables-count=15 --oltp-table-size=2000000 --rand-init=on --oltp-read-only=off --rand-type=uniform --max-requests=0 --mysql-user=user --mysql-port=3306 --mysql-host=host1,host2 --mysql-table-engine=ndbcluster --max-time=600 --num-threads=$i --oltp-auto-inc=off run > galera_2_nodes_$i.txt
done
Below are the graphs of average throughput at the end of 10 minutes, and 95% response time.




Galera clearly performs better than NDB with 2 instances!
But things become very interesting when we graph the reports generated every 10 seconds.


Surprised, right? What is that?
Here we see that even if the workload fits completely in the buffer pool, the high number of TPS causes aggressive flushing.
We assume the benchmark in the Galera blog post was CPU bound, while in our benchmark the behavior is I/O bound.
We then added another 2 more nodes (m1.xlarge instances), but kept the dataset at 15 tables x 2M rows , and re-ran the benchmark with NDB and Galera. Performance on Galera gets stuck, due to I/O. Actually, with Galera, we found that performance on 4 nodes was worse than with 2 nodes; we assume this is caused by the fact that the whole cluster goes at the speed of the slower node.
Performance on NDB keeps growing as new nodes are added, so we added another 2 nodes for just NDB (6 nodes total).


The graphs show that NDB scales better than Galera, which is not what we expected to find.
It is perhaps unfair to say that NDB scales better than Galera, but rather that NDB checkpoint causes less stress on I/O than InnoDB checkpoint, thus the bottleneck is on InnoDB and not Galera itself. To be more precise, the bottleneck is on slow I/O.
The follow graph shows the performance with 512 threads and 4 nodes (NDB and Galera) or 6 nodes (only NDB). Data collected every 30 seconds.

Paul Robert Lloyd engages with the two main approaches to the matter of responsive images and finds them wanting. Could “Bah, humbug!” be a reasonable response to markup excess?
Rebecca Cottrell speeds through the dark landscape of web wireframes towards the snowy slopes of early prototypes with glittering animations and transitions that show your developing product at its best.
Dan Donald conjures up the ghost of Christmas the Web Yet To Come through the possibilities offered by the contextual data available to us from web-enabled devices.
Val Head marshals overexcited CSS transitions and animations, which are like naughty children elbowing their way out of the presentation layer and into the behaviour grotto to get at the goodies before Christmas. Santa will be pleased!
The HTTPbis Working Group met in Atlanta last month; here’s how things are going.
HTTP/1.1
We’re now out of Working Group Last Call on all of our “core” documents, so the editors are working through the issues that brought up. As soon as that’s done, we’ll go to IETF Last Call, and hopefully soon after well have a number of new RFCs defining HTTP/1.1.
Here, you can see the upswing in number of issues during our WGLC period:

To see for yourself, use the "work-in-progress" documents linked from our home page.
As part of that process, I also spent some time updating the parts of the documents that detail changes from RFC2616, since this will be the easiest way for most developers to get an idea of what’s changed. See:
- part 1: Messaging - Changes from RFC2616
- part 2: Semantics - Changes from RFC2616
- part 4: Conditional Requests - Changes from RFC2616
- part 5: Range Requests - Changes from RFC2616
- part 6: Caching - Changes from RFC2616
- part 7: Authentication - Changes from RFC2616 and RFC2617
Note that these lists are by no means complete, and they’ll likely change more before we publish.
HTTP/2.0
We also started work in earnest on HTTP/2.0, with initial discussions focusing on header compression and the upgrade mechanism. We now have a first draft (which is just a straight copy of the SPDY document, to give us a decent basis for future diffs) and the beginnings of an issue list.
The rough approach to upgrade being discussed is to use something like NPN if the connection is using TLS; we’ve communicated that requirement to the TLS Working Group, and they have decided (with a little nudging from their AD ;) to begin work on that. Note that NPN is just one proposal in this space.
If TLS isn’t being used, we’re looking at using HTTP upgrade as a base; see Gabriel and Willy’s draft for a good description of the considerations around that. Furthermore, we want to be able to optimise it, potentially using a DNS record (see Eliot’s new draft for a proposal), and perhaps a header like SPDY’s Alternate-Protocol.
For compression, we have a number of proposals to replace zlib, since CRIME took it off the table. So far, the only one with an implementation is Roberto’s; we’d like to be able to do a bake-off and use a common set of sample headers to compare them.
To keep things moving, we’ve scheduled an interim meeting for HTTP/2.0 issues in late January. If you’d like to come, please respond by the deadline; be aware, however, that this will be very much a working session.
Finally, some people may be interested to know that we now have a http_2 twitter account that will occasionally spout HTTP/2.0-related news; for those who want to track the effort without all of the details, it may be what you’re looking for.
Here are a bunch of the HTTP-related folks having dinner in Atlanta at the conveniently in-hotel Trader Vic’s:

Stephen Fulljames places the reindeers of thought before the sleigh of action, encouraging coders everywhere to plan ahead when implementing JavaScript libraries.
Rachel Nabors brings together the web’s three Magi – HTML, CSS and JavaScript – to create seamless, soundtracked animations in the browser.
If you have yet to be convinced that YUIConf 2012 was the best conference yet, then perhaps this next video will help change your mind. We continue our series of video releases with one entitled “Mojito for YUI Developers” by Caridy Patiño. You can find previous posts in this series here, as well as keep up with all of the videos via YUI Theater and YouTube.
In this presentation from YUIConf 2012, Mojito engineer Caridy Patiño talks about the Mojito project and its usefulness for YUI developers. In particular, Caridy discusses Mojito’s rich offering of boilerplate and building capabilities for YUI projects, with a variety of options to build traditional YUI web apps, mobile apps to deploy on devices, and Node.JS applications as well. Caridy also describes how Mojito leverages YUI library and YUI tool chains to build at scale.
I had a customer recently who needed to reduce their database size on disk quickly without a lot of messy schema redesign and application recoding. They didn’t want to drop any actual data, and their index usage was fairly high, so we decided to look for unused indexes that could be removed.
Collecting data
It’s quite easy to collect statistics about index usage in Percona Server (and others) using the User Statistics patch. By enabling ‘userstat_running’, we start to get information in the INFORMATION_SCHEMA.INDEX_STATISTICS table. This data collection does add some overhead to your running server, but it’s important to leave this running for a good long while to get a good dataset that is representative of as much of your workload as possible.
If you miss collecting index stats while some occasional queries run, you run the risk of dropping indexes that are being (seldomly) used, but are still important for the health of your system to have. This may or may not impact you, but I’d highly recommend you manually review the list of unused indexes being generated above before you simply drop them.
Depending on your sensitivity to production load, you may therefore want to run this several days, or just sample different short windows during your normal production peak. In either case, you may want to compare or repeat this index analysis, so let’s setup a separate schema to do this. Its important that this index analysis is on a server with your full production dataset loaded, but it could be a master, or just a slave somewhere (just be careful not to break replication!).
mysql> create schema index_analysis;
If our index_statistics are collecting on the same server, then we can simply get a snapshot of it into our schema with one command:
mysql> create table index_analysis.used_indexes select * from information_schema.index_statistics;
If the stats come from some other server, then you may need to dump and load a copy of that table into your working index_analysis schema.
Merging stats from several servers
In the case of this client, they had a master and several slaves taking read traffic. The index workload on these two sets of servers was different and I wanted to make sure I considered the index statistics from both of these sources. Be sure you include all relevant index stats from all aspects of your application, otherwise garbage-in, garbage-out and you risk dropping necessary indexes.
To accomplish merging multiple result sets, I gathered statistics from both their master and slave and loaded them into my schema as separate tables. Then I simply created a view of a UNION DISTINCT of those two tables:
mysql> create view used_indexes as
(select TABLE_SCHEMA, TABLE_NAME, INDEX_NAME from master_index_stats )
UNION DISTINCT
(select TABLE_SCHEMA, TABLE_NAME, INDEX_NAME from slave_index_stats)
ORDER BY TABLE_SCHEMA, TABLE_NAME;
Now I can query the ‘all_known_index_usage’ and see the union of both of those datasets. This, of course, can be extended to all the datasets you want.
Interpreting the data
So, this is all well and good, but how do we then easily determine a list of indexes that are not being used? Well, for this we need to back to the INFORMATION_SCHEMA to get a list of ALL the indexes on my system (or at least the schemas I want to consider dropping indexes in). Let’s keep using views so this dynamically updates as our schema changes over time:
mysql> create view all_indexes as select t.table_schema as TABLE_SCHEMA, t.table_name as TABLE_NAME, i.index_name as INDEX_NAME, i.NON_UNIQUE as NON_UNIQUE, count(*) as COLUMN_CNT, group_concat( i.column_name order by SEQ_IN_INDEX ASC SEPARATOR ',') as COLUMN_NAMES from information_schema.tables t join information_schema.statistics i using (table_schema, table_name) where t.table_schema like 'sakila%' group by t.table_schema, t.table_name, i.index_name;
Now I can query this view to see my indexes:
mysql> select * from all_indexes limit 1 G *************************** 1. row *************************** TABLE_SCHEMA: sakila TABLE_NAME: actor INDEX_NAME: idx_actor_last_name NON_UNIQUE: 1 COLUMN_CNT: 1 COLUMN_NAMES: last_name 1 row in set (0.03 sec)
Now I need a way to find the set of indexes in all_indexes, but not in used_indexes. These indexes (if our original index statistics are good) are candidates to be dropped:
create view droppable_indexes as select all_indexes.table_schema as table_schema, all_indexes.table_name as table_name, all_indexes.index_name as index_name from all_indexes left join used_indexes using (TABLE_SCHEMA, TABLE_NAME, INDEX_NAME) where used_indexes.INDEX_NAME is NULL and all_indexes.INDEX_NAME != 'PRIMARY' and all_indexes.NON_UNIQUE = 1;
Note that we also want to avoid dropping PRIMARY and UNIQUE indexes since those tend to enforce important application data constraints, so we added some additional criteria to the end of our SELECT.
I can now select my droppable (unused) indexes from this view:
mysql> select * from droppable_indexes; +--------------+---------------+-----------------------------+ | table_schema | table_name | index_name | +--------------+---------------+-----------------------------+ | sakila | actor | idx_actor_last_name | | sakila | address | idx_fk_city_id | | sakila | city | idx_fk_country_id | | sakila | customer | idx_fk_address_id | | sakila | customer | idx_fk_store_id | | sakila | customer | idx_last_name | | sakila | film | idx_fk_language_id | | sakila | film | idx_fk_original_language_id | | sakila | film | idx_title | | sakila | film_actor | idx_fk_film_id | | sakila | film_category | fk_film_category_category | | sakila | film_text | idx_title_description | | sakila | inventory | idx_fk_film_id | | sakila | inventory | idx_store_id_film_id | | sakila | payment | fk_payment_rental | | sakila | payment | idx_fk_customer_id | | sakila | payment | idx_fk_staff_id | | sakila | rental | idx_fk_customer_id | | sakila | rental | idx_fk_inventory_id | | sakila | rental | idx_fk_staff_id | | sakila | staff | idx_fk_address_id | | sakila | staff | idx_fk_store_id | | sakila | store | idx_fk_address_id | +--------------+---------------+-----------------------------+ 23 rows in set (0.02 sec)
From here I can use some clever SQL to generate the precise ALTER TABLE statements to drop these indexes, an exercise left to the reader. ![]()
Estimating the size of these indexes
But, what if we want to see if it’s worth doing first? Do these indexes actually represent a significant enough amount of disk space for it to be worth our while?
We need some more information to answer this question, but fortunately in Percona Server, we have it in the INFORMATION_SCHEMA.INNODB_INDEX_STATS table and the ‘index_total_pages’ column. A page in Innodb is (usually) 16k, so some simple math here should help us know how much disk space an index utilizes.
Let’s go update our all_indexes view to include this information:
mysql> drop view if exists all_indexes; mysql> create view all_indexes as select t.table_schema as TABLE_SCHEMA, t.table_name as TABLE_NAME, i.index_name as INDEX_NAME, i.NON_UNIQUE as NON_UNIQUE, count(*) as COLUMN_CNT, group_concat( i.column_name order by SEQ_IN_INDEX ASC SEPARATOR ',') as COLUMN_NAMES, s.index_total_pages as index_total_pages, (s.index_total_pages * 16384 ) as index_total_size from information_schema.tables t join information_schema.statistics i using (table_schema, table_name) join information_schema.innodb_index_stats s using (table_schema, table_name, index_name) where t.table_schema like 'sakila%' group by t.table_schema, t.table_name, i.index_name;
Now we can can see index sizing information in the all_indexes view:
mysql> select * from all_indexesG ... *************************** 33. row *************************** TABLE_SCHEMA: sakila TABLE_NAME: rental INDEX_NAME: rental_date NON_UNIQUE: 0 COLUMN_CNT: 3 COLUMN_NAMES: rental_date,inventory_id,customer_id index_total_pages: 27 index_total_size: 442368 ...
Now we just need to update our droppable_indexes view to use that information:
mysql> drop view if exists droppable_indexes; mysql> create view droppable_indexes as select all_indexes.table_schema as table_schema, all_indexes.table_name as table_name, all_indexes.index_name as index_name, ROUND(all_indexes.index_total_size / ( 1024 * 1024 ), 2) as index_size_mb from all_indexes left join used_indexes using (TABLE_SCHEMA, TABLE_NAME, INDEX_NAME) where used_indexes.INDEX_NAME is NULL and all_indexes.INDEX_NAME != 'PRIMARY' and all_indexes.NON_UNIQUE = 1 order by index_size_mb desc;
Now we can easily see how big each index is if we dropped it (not big in this case with test data):
mysql> select * from droppable_indexes; +--------------+---------------+-----------------------------+---------------+ | table_schema | table_name | index_name | index_size_mb | +--------------+---------------+-----------------------------+---------------+ | sakila | payment | fk_payment_rental | 0.27 | | sakila | rental | idx_fk_customer_id | 0.27 | | sakila | rental | idx_fk_inventory_id | 0.27 | | sakila | rental | idx_fk_staff_id | 0.19 | | sakila | payment | idx_fk_staff_id | 0.17 | | sakila | payment | idx_fk_customer_id | 0.17 | | sakila | inventory | idx_store_id_film_id | 0.11 | | sakila | inventory | idx_fk_film_id | 0.08 | | sakila | film_actor | idx_fk_film_id | 0.08 | | sakila | film | idx_title | 0.05 | | sakila | film | idx_fk_original_language_id | 0.02 | | sakila | city | idx_fk_country_id | 0.02 | | sakila | film_category | fk_film_category_category | 0.02 | | sakila | customer | idx_last_name | 0.02 | | sakila | store | idx_fk_address_id | 0.02 | | sakila | actor | idx_actor_last_name | 0.02 | | sakila | customer | idx_fk_address_id | 0.02 | | sakila | staff | idx_fk_address_id | 0.02 | | sakila | film | idx_fk_language_id | 0.02 | | sakila | address | idx_fk_city_id | 0.02 | | sakila | customer | idx_fk_store_id | 0.02 | | sakila | staff | idx_fk_store_id | 0.02 | +--------------+---------------+-----------------------------+---------------+ 22 rows in set (0.02 sec)
Recovering filesystem space
Now astute innodb experts will realize that this isn’t the end of the story when it comes to reclaiming disk space. You may have dropped the indexes, but the tablespaces on disk are still the same old size. If you use innodb_file_per_table, then you can rebuild the tablespace for your table by simply doing:
mysql> alter table mytable ENGINE=Innodb;
However, this blocks and on a large table can take quite some time. All the normal tricks and tips about doing a long blocking schema change without affecting your production environment apply here and is out of scope for this blog post.
Happy hunting for those unused indexes!
Tim Kadlec broadens the scope of responsive web design to include bandwidth and hardware capabilities. Images too big? Another JS library? It’s time to work off the seasonal weight gain from your responsive website.
A design contract is like a business card—it comes from the same desk, and bears the same creative mark. But it’s also the business card you hate handing out: a folder of legal gibberish with terrible formatting that reminds the client of everything that could possibly go wrong before the work has even started.
Is this just a necessary evil? Why can’t contracts evolve like everything else?
Actually, they can—and should. Modernizing your contract will not only make it match your carefully crafted brand, but it can also help you reach an agreement faster, and even strengthen your position when negotiating. This is not an easy task. Legal content is a delicate matter, and you definitely can’t start tweaking your contract like it’s a blog post.
Before we start modernizing contracts, we first have to understand their purpose, and how and why they got the way they are. It’s a long journey back.
Five Roman principles of contracts still valid today
The Romans developed a sophisticated system of commercial law that has become the foundation for pretty much all of the Western world’s legal systems. A design contract was probably signed to make the incredible decorations of Ara Pacis. Such a contract would have been created to accomplish something not that different from today’s products of design: defining what must be done, the deadline, the client’s approval, and the price. The concept of copyright did not exist yet, but unauthorized and fraudulent copies of literary works were socially unacceptable. (As for non-literary works, good luck copying those marble statues.)
While our work has evolved, contracts have essentially stayed the same—for a number of good reasons. In fact, several principles are just as important in today’s contracts as they were in Roman times.
1. Verba volant, scripta manent
Spoken words fly away, written words stay.
In a world where few people could read or write, a written contract was much more difficult to obtain—and therefore much more valuable than a handshake. Romans were the first to establish a now-universal principle of civil procedure: The burden of proof is on the plaintiff (onus probandi incumbit ei qui dicit). Therefore, a written contract protects the wronged party. This is still true today, so don’t only use a written contract before work begins; make sure every modification is documented in writing.
That’s a much easier task today than in Roman times. You don’t have to run to a scribe, or even a notary. E-mail has been proved legally binding multiple times, so to amend a contract, you can just drop a line like, “As discussed in today’s meeting, we mutually agree to modify the statement of work as follows…”
Some contracts even have a clause that requires all amendments to be in writing. If that’s the case, you’ll want to make certain you follow it; otherwise, the client can make excuses for not paying you for extra work.
AIGA’s standard agreement for design services uses a nifty solution to make sure all modifications are in writing and that there’s a limit to the number of modifications that can be requested:
4.2 Substantive Changes. If Client requests or instructs Changes that amount to a revision of at least 15% of the time required to produce the Deliverables, and or the value or scope of the Services, Designer shall be entitled to submit a new and separate Proposal to Client for written approval. Work shall not begin on the revised services until a fully signed revised Proposal and, if required, any additional retainer fees are received by Designer.
As you can see, it’s the same old verba volant, scripta manent still in use.
2. Aliquid dare, aliquid retinere
Give something, keep something.
The value of a project depends not only on what you put in a contract, but also what you leave out. This is particularly true for design, which is not strictly a product, nor strictly a service. It’s a hybrid set of “deliverables,” and the contract (not the e-mail with the design attached) is the place where you give them to your client.
Be wary of what you give and keep. If possible, hold onto copyright: Delay the assignment, or the effective date of the license, until the money is in the bank. This is the best leverage you have.
Clients will try to do the same with payment, of course. Welcome to contract negotiation.
On this inevitable battlefield, details make a difference. For example, imagine you are an illustrator who creates a set of characters for a story. Your client picks the ones they like, and those are the deliverables they buy. Why shouldn’t you keep the rest, and “recycle” them for future projects? If you don’t specify this in the contract, the client will be assigned all the work in connection with the project, including unused sketches.
Same thing if you are delivering code. It’s common to incorporate snippets of code into multiple projects, but just because that code ends up in that project doesn’t mean that client owns it. These are usually called “design tools” in a contract—which means instead of giving something away, you’re simply giving your client permission to continue using the tools.
3. Leges sine moribus vanae
Laws are useless without customs.
Just as graphical and technical standards are essential to designing, standards and industry practices play a crucial role in negotiating contracts. Following best practices not only lowers transaction costs and streamlines the process, but also fosters more balanced deals.
What are the contractual standards of design? The AIGA agreement mentioned earlier is a great start, but standards can also live in single clauses. Eric Adler, a lawyer who works with creative professionals, knows which clauses of his contract are more likely to be negotiated, and takes care to explain those to his clients.
An excerpt from Eric Adler’s contract annnotations.
When it comes to liability, Adler suggests that it’s standard to cover your asse(t)s up to the overall net value of the project. You could try to ask for more, but no one wants to make a client nervous over a legal boilerplate, and standards make sure this doesn’t happen.
Standards don’t just come from lawyers or unions. Andy Clarke’s Contract Killer is extremely popular among freelance designers—in fact, a version of his contract is one of the most viewed and downloaded items at my company, Docracy, which provides an open collection of legal documents. This is likely due to Clarke’s strict no-legalese policy. He even dropped the classic impersonal language, transforming it into a natural dialogue with the client: “What both parties agree to do.”
The result is a set of informal yet clear rules that cover essential legal provisions, like assigning copyright only upon full payment and reserving portfolio rights.
But where is all the horrible small print?
There is none. This contract shows that it’s possible to enter a binding agreement using everyday English. Your lawyer may not like it, because he may fear not being taken seriously enough, or feel uneasy not following his standard. Fortunately, this is something that has actually changed since the Romans. They had to use formulae and magic words to make sure the contract would be upheld in court, while we typically enjoy shared language and literacy skills.
4. Clausulae insolitae indicunt suspicionem
Unusual clauses will raise flags.
We all like standards, but let’s face it: Everything is negotiable, and people will always try to sneak advantageous clauses into the contract. You need to make sure you don’t sign anything you’ll regret, and spotting bad provisions is not a lawyers-only job. Scanning contracts is a necessity sometimes, so always look closely at the following parts:
- Parties, particularly when companies are involved: Make sure the people you’re dealing with have the power to bind their companies.
- IP provisions: Who owns copyright and when, and what the licensing limitations are.
- Your representation and warranties—the fewer, the better: underpromise and overdeliver!
- Termination: What happens if someone wants to get out of the deal early?
- Dispute resolution: The clause no lawyer ever wants to give up. Watch this one, because you don’t want to let a client drag you to a court a thousand miles away. If you can agree to arbitration or mediation, even better.
The more contracts you read, the better you’ll get at spotting weird provisions. Trust your judgement: If something doesn’t seem quite right, it probably isn’t.
At minimum, you should ask for an explanation. This is never a waste of time. If you have a lawyer do this, just find someone who doesn’t bill by the hour, or this negotiation will take forever.
5. Pacta sunt servanda
A deal is a deal.
Both in Roman times and today, if you don’t deliver, it’s on you. Keeping promises is fundamental for a professional reputation. That’s why you have to be clear and consistent in the promises you make.
How do inconsistencies arise? One common way is having a statement of work (SOW) that’s not compatible with a master service agreement (MSA). This happens more often than you might think, particularly if no one has ever read that thirty-page agreement. If it’s not clear which one prevails (yes, you have to write it down), you can find yourself in a legal mess.
For example, capping your hours in the MSA is a great way to mitigate the fixed-fee or milestone-based pricing you agreed to in the SOW, but only if the cap prevails! Vice-versa, if you know you’ll only be looking at the SOW and all the special payment provisions are in there, then it should probably override any older pricing rule buried in that thirty pages of small print.
Even better, an MSA doesn’t really need to include thirty nasty pages of small print.
Making a modern contract
I bet you didn’t read iTunes’ latest Terms and Conditions before clicking “I Agree.” We try to read contracts when we think it’s important, but it’s not easy, for several reasons:
- Contracts are optimized for print, but today we read mostly on screen.
- They are often poorly formatted and typographically awful.
- Many elements are difficult to read, like definitions and ALL CAPS PARAGRAPHS.
- They’re full of legal jargon, not plain language.
The good news is, these problems can be fixed.
Typography
Let’s start with font. Designers and clients alike now mostly read on screens. Electronic signing is a reality, so there are few arguments for optimizing a contract for print.
If you’ve studied typography, you know how to use contrast, proximity, and alignment to create emotional and persuasive effects, and you can apply these same principles to legal text.
Matthew Butterick, author of Typography for Lawyers, has even developed a font optimized for legal text: Equity, a serif font that also looks good on screen—a nice compromise. Whatever you choose, ensure you give your contract balance and contrast.
Typesetting
Contracts are a very peculiar subset of legal documents. How can you use typesetting skills to improve their layout?
- Structure them in nested lists. HTML does such a great job handling nested lists and headings, so why use a crappy text editor? You often see reckless tabbing and manual line breaks made by frustrated people desperately trying to keep order. Using tools of the trade like Markdown, LaTeX, and Illustrator, you can do better in no time.
- Divide the boilerplate from the custom terms. Highlight relevant content like party names, important numbers, and percentages so they stand out from the boilerplates and can be easily skimmed.
- Make important clauses stand out, but never use all caps. The law only asks the drafter, in specific situations, to highlight certain provisions—and there are ways to do that without sacrificing readability. If your lawyer thinks differently, she’s wrong.
- Allow longer paragraphs. Words need to “breathe,” but contracts also need to cluster like clauses for readers. For this reason, line length is a delicate choice that depends both on the length of the clauses in your contract and on the font you choose to use. If you opt for a sans serif, you might get away with longer lines, but be sure to keep generous margins and line spacing (ideally, 120 to 145 percent of the point size, according to Typography for Lawyers).
You’ll also need to decide whether to justify or left-align text. The general rule is that justified text only works with proper hyphenation. This means you’ll have to manually input non-hyphenated breaks for the words you want to keep on the same line. Unless you’re drafting the contract yourself from start to finish, this is a daunting task. And, if your contract manages to have short paragraphs, ragged-right looks more natural, particularly on screen.
When we redesigned Docracy’s PDF typography, we opted for a longer line with lots of white space on the sides. This lets even the longest contract breathe, yet creates a compact final look:
Plain language
Now for the million-dollar question: Why are contracts written in legal jargon? Sadly, it’s because lawyers are too lazy and change-averse to rewrite their forms. The good news is, this is changing. And you can contribute; most formulaic “legalese,” like herein, thereof, or hereby, can just be replaced with “this.” You might even be able to remove entire lines, but better check with a lawyer to make sure.
Here’s an example of traditional contract language rewritten in plain English. Not only is the new version half the length, but it’s much easier to understand:
|
Timing. Designer will prioritize performance of the Services as may be necessary or as identified in the Proposal, and will undertake commercially reasonable efforts to perform the Services within the time(s) identified in the Proposal. Client agrees to review Deliverables within the time identified for such reviews and to promptly either, (i) approve the Deliverables in writing or (ii) provide written comments and/or corrections sufficient to identify the Client’s concerns, objections or corrections to Designer. The Designer shall be entitled to request written clarification of any concern, objection or correction. Client acknowledges and agrees that Designer’s ability to meet any and all schedules is entirely dependent upon Client’s prompt performance of its obligations to provide materials and written approvals and/or instructions pursuant to the Proposal and that any delays in Client’s performance or Changes in the Services or Deliverables requested by Client may delay delivery of the Deliverables. Any such delay caused by Client shall not constitute a breach of any term, condition or Designer’s obligations under this Agreement. |
Timing. Designer will prioritize the Services as may be necessary, or as identified in the Proposal, and will take reasonable efforts to perform the Services in a timely manner. Client agrees to review Deliverables within the time identified in Schedule A and to either (i) approve the Deliverables in writing or (ii) provide exhaustive written feedback. Designer may request written clarification of any of Client's comments. Delays in the performance of the Services due to Client's late feedback or requested Changes will not constitute a breach of Designer's obligations. |
Classical roots, contemporary documents
There are many reasons the core rules of contracts are still in place two millennia after the fall of Rome. But there are other elements that we can, and should, take to the twenty-first century.
If we want to address the readability problems unique to our era—and improve communication with our clients—then it’s time we fix the language, layout, and typesetting of our contracts. And who better than designers to do it? 
- Illustration by Kevin Cornell
RSS readers: Don't forget to join the discussion!
Je ne suis pas monsieur Lebowski. C’est vous monsieur Lebowski. Moi, je suis le Duc.
—The Big Lebowski, French version
There is a world where Harry Potter’s arch enemy is “Du-weißt-schon-wer,” Facebook users click the “Me gusta” button, and the Dude is named “le Duc.” This world is a translated world.
We—the people who make websites—now study almost every aspect of our trade, from content and usability to art direction and typography. Our attention to detail has never been greater as we strive to provide the best possible experience. Yet many users still experience products that lack personality or are difficult to understand.
They are users of a translated version.
When we pledge to embrace the adaptable nature of the web—to make our websites responsive and even future-ready—we’re typically talking about diversity of devices. But the web’s diversity also comes in the form of different languages and cultures.
Translation affects users’ experiences—and our organizations’ success. It’s time we consider translation part of our jobs, too.
Waiting for C-3PO
“Do you want your forum clean like this?”
I had just set up a user forum in French when I stumbled upon this rather bizarre banner. “What makes the forum so clean?” I wondered. “Do they tidy the code every day?” I had to change the language back to English to understand it: “Do you want your own forum like this?”
In French, “propre” means either “own” or “clean,” depending on how it’s used. The rule is simple; any translator would know it. More precisely, any human translator. Google Translate, the system behind the French version of the forum, obviously wasn’t so sure.
It’s not just Google Translate, either. In the 1950s, Alan Turing, the father of computer science, devised a test to evaluate machine intelligence through conversations. The biggest Turing test ever was held last June to celebrate what would have been Turing’s hundredth birthday. The winner was probably the most advanced chatbot ever created, yet Eugene Goostman—as this bot is named—failed to fool the judges 70 percent of the time. When will machines pass the test? In the year 2029—maybe.
This should come as no surprise. Languages are amongst the richest and most complex systems humankind has ever produced. When machines gain the ability to really speak (and therefore translate), it will be possible to use Google Translate in a professional context—and no doubt we’ll also have Google Design and Google Copywriting by then. But today, Google Translate is to translation what the auto mode is to photography: a quick-and-dirty solution. It comes in handy when you need to get an idea of what’s being said about your project on Weibo (China’s version of Twitter), but it isn’t a good option when you need to translate your website into Spanish.
While we’re waiting for C-3PO, we need professional translators. We must also acknowledge their creativity and recognize them as peers.
Great design deserves great translation
Translating is a respectable, valuable, creative and worthwhile use of a human brain.
—David Bellos, Is That a Fish in Your Ear?
Le Big Lebowski is a masterpiece. I would even argue that it surpasses the original. Everything is just perfect: the dubbing, the humor, the dialogue. The translators retained the essence of the film while adapting it for an audience that has no idea what a “dude” is. They managed to translate not just the words, but the Coen brothers’ genius as well.
E-mail service provider MailChimp is a masterpiece, too. Aarron Walter’s UX team has succeeded in creating a unique personality. Much of this personality manifests itself through copy: the greetings from Freddie, the company’s joke-cracking mascot; the always-relevant error and help messages; and—above all—the “funny but not goofy, informal but not sloppy” voice and tone used throughout the application.
Now, if MailChimp were to be translated into Spanish, Russian, or Chinese, what would become of this personality? What does it mean to be “informal but not sloppy” in Japanese? Should the mascot’s name still be “Freddie Von Chimpenheimer IV” in German, or could that be misinterpreted? Can you greet an Indian user with “Hi. You could be a part-time model”?
There are no easy answers to these questions. Translating is walking a tightrope. The challenge is to remain faithful to the original design while adapting it for a new audience, for a different culture.
If you think a machine can do this, take a look at this Google translation of MailChimp’s success message, “High fives! Your list has been imported”:
Cinco años de alta! Su lista ha sido importado.
Show that to a Spanish-speaking friend and you’re sure to get a bewildered look.
The road ahead
The web is home to plenty of innovation. But when it comes to translation, other industries are far ahead.
If we want to reap the benefits of translation, we must learn what it takes to do it well—and why it matters. Let me give you two examples.
Linguistic validation
The pharmaceutical business may not seem to share much with web design, but it has one best practice that could inspire us: linguistic validation.
Introducing a new drug into the market is a complex and controlled process that includes a long series of trials and reviews. Some of these tests involve the patients themselves, such as Patient-Reported Outcomes questionnaires, which assess whether a drug has actually improved a patient’s quality of life. These questionnaires are written in English by clinicians and then translated into hundreds of languages.
Ordinary translation is usually a two-step process: translation then proofreading (some even skip the proofreading). The linguistic validation of patient questionnaires has a few more steps, such as doing both forward and backward translations and pilot testing.
Why such a complicated and costly process? Two reasons: First, the original version is a precise research instrument. Nothing has been left to chance. Second, it is essential for patients to perfectly understand the questions, because what they report will serve as scientific data. The questionnaire must therefore be intuitive and patient-friendly.
Thoughtfully designed products, user-friendly interfaces—aren’t these what we aim for? If we care equally about all our users, it’s time we start thinking of translation as something slightly more complex than a word-to-word job.
Cultural expertise
Raving Rabbids is a humorous party game designed in Ubisoft’s Paris studio. The development team includes a localization specialist in charge of the game’s eight localized versions. She works hand in hand with designers to ensure their jokes, references, and altogether craziness are translatable. For the U.S., Rabbids’ biggest market, a duo of Americans from Nickelodeon even gave the team a little extra cultural insight.
It costs millions of dollars to produce a major video game, and even more to target international audiences. Because playing a game is such an immersive experience, the teams behind Rabbids and many other games have found that localization specialists are critical. They are not given a finished product to adapt—they take part early in the project, as their feedback on cultural matters may profoundly change the game’s design.
The game industry prefers the term “localization” to “translation” because the latter is too often restricted to text. This says a lot about how seriously game studios take cultural expertise. Because they know a cultural misfit can stall a game’s chances of success—and they know for every dollar invested in localization, there’s a $25 return.
Because they know that translation—sorry, localization—is UX.
Translate early, translate often
Most startups employ what could be called the lemonade tycoon approach: Start in your neighborhood, amongst the people you know; this is your best bet. Get it right at home before expanding into far-off lands.
I’m not saying you shouldn’t start in your own country. Local knowledge is priceless. But why wait to internationalize? Unlike lemonade selling, the web is international by nature. From day one, your website will be accessible to any person on this planet.
What’s more, procrastination has a cost. According to Smartling, a translation software company, “it can take companies 12-18 months to internationalize their code and launch their first foreign language site, absorbing much of the company’s engineering resources.”
Companies face the same problem when they develop a mobile version of their site afterward. Good thing many now adopt a “mobile first” process.
Perhaps they should consider “foreign first,” too.
It’s a big world out there
When you come from a non-English-speaking country, as I do, a “foreign first” approach is very likely to mean “English first.” But what if you’re based in New York, Manchester, or Auckland? Which language should you go for?
The answer is actually not to think “language,” but rather “opportunity” and “culture”—as these three companies have:
- Wufoo is a popular form builder from Tampa, Florida. At the beginning of 2012, it launched Wufoo Español, its first foreign version. You won’t find the Spanish version at wufoo.es, but at wufoo.com.mx—because it saw an opportunity in a neighboring market, and language was a means to reach that market. Besides, Wufoo doesn’t mix up language and local culture: It plans to roll out additional localized versions for Spain and Argentina.
- CanaDream is a Canadian RV rental company whose website is available in three languages. English and French are obvious choices, but the third one is trickier: German. Again, the company saw an opportunity—Germans love RV travel. But German people generally speak good English, don’t they? Yes, many do—but they will still prefer a company that attends to them in their own language.
- Bla Bla Car is a car-sharing service born in France. Here we can see that “English first” isn’t always the rule. Bla Bla Car’s first foreign version was in Spanish. The car-sharing market was less competitive in Spain than in other European countries, which gave Bla Bla Car the opportunity to test-run its internationalization before moving on to other markets—which it eventually did. Car sharing is getting more and more popular in Europe, and Bla Bla Car aims for leadership in the region—and in a multilingual area, this has required translation to seven languages and counting.
Bargain-basement market research
Most startups can’t afford international market research. That’s why they focus on their home market. But just as Paul Boag taught us about bargain basement usability testing, we can find affordable market research techniques, too.
Once you’ve settled on a country to target, go to ProZ and look for a translator or agency based there. Brief her about your project and send your prototype or an access to your beta. Ask her to translate the key screens. Even at this stage, you can get lots of feedback: “Are you aware your app name can hardly be pronounced—let alone remembered—by Brazilians?” “I’m sure having Acme Inc. as a client is a great reference in the U.S., but nobody knows them here.” “This photo of a blond-haired, blue-eyed guy probably won’t resonate with a Turkish market.”
Then ask your translator to run a user test using her network of proofreaders. You don’t need hundreds of people—with only ten participants, you’ll uncover any major cultural faux pas. You’ll also gain a general understanding of whether people are interested in the project, what their main questions are, and whether they like the visual design.
Finally, discuss your personas with the translator: Maybe Harriet should be renamed María and relocated to Valparaiso. And what about adding Hugo, the typical backpacker from the Netherlands? With localized personas, all your users will be given equal consideration throughout the design process.
Of course, you’ll need more precise data eventually. But this quick-and-dirty research is enough to get you started. You’ll iterate from there.
Your new teammate
When you start translating early, you make the translator part of your team. Chances are this will be a very rewarding experience. At Novius, my company, it’s changed the way we work.
For major projects, we now create and feed a glossary—or as I like to call it, a “style spreadsheet.” CSS stylesheets are understood by both designers and developers and guarantee style consistency across an entire website. Similarly, glossaries are by and for the whole team and ensure the consistency of content. Just like you want a color scheme that’s thoroughly followed, you also want to make sure “module,” “plugin,” and “extension” aren’t all used to refer to the same concept. Le fond et la forme.
We have also learned that a quality translation begins with the code. Developers strive for reusable code, and strings are no exception. Depending on how a developer handles them, he could make the translator’s job straightforward, or virtually impossible.
When dealing with sentences like, “1 person has this question” and “X people have this problem, including you,” translators are often asked to translate strings like: “person has,” “people have,” “this,” “question,” “problem,” and “including you.”
Even with context, deconstructing these sentences is a translator’s nightmare. For languages with gender, the string “this” is untranslatable (e.g., esta pregunta and este problema in Spanish). In many languages, like Russian, plurals take several forms (e.g., for the plural “persons,” you would say four челове́ка, but five челове́к). And the list goes on.
Since language isn’t code, developers and translators have a lot to learn from each other. Translators will tell them the software they use has translation memory, so there’s no need to avoid repetition. They will discuss how to handle variables in text. They will also decide together which internationalization system (such as gettext) and text file format (like XML or PO) to use.
Not a one-off thing
I won’t lie to you. Once you’ve translated your website, you’re in for good. People don’t care that they’re using a translated version. For them this is the only version. So you’ll have to keep translating.
They will hate being considered “second-rate” users. Once you’re out of beta, 90 percent translated is not OK. How would your users feel if every website update resulted in a buggy mobile version? Users of translated versions experience this all the time, with English text suddenly popping up out of nowhere. To make it worse, the newest features—proudly announced and long-awaited—are usually the ones left partially translated. Users do get the message: You’re not important enough for us to prioritize translation quality.
While good localization boosts conversion rates, bad or partial translation may ruin a user experience, giving users an uneasy feeling about the whole company: If they can’t even get their website right, how bad will the customer support be?
In fact, I recently chose not to purchase a service because of a pricing page that proclaimed, “Give a price to these ladders with your growing company.”
Guess what it was selling? Translation software.
A multilingual web
If I am selling to you, I speak your language. If I am buying, dann müssen Sie Deutsch sprechen.
—Willy Brandt, West German political leader
The language of the web is English as much as HTML. If the web had a capital, it would be somewhere around San Francisco Bay. Web professionals worldwide use English expressions in almost every sentence: Like, browser, responsive, Tweet, SEO, etc.
However, 73 percent of internet users don’t speak English, and their numbers are growing. We now enter the age of glocalisation.
In our move toward universal design, we must not forget languages and the people who master them. “Translating is writing,” said French writer Marguerite Yourcenar.
Today we can also say, translating is designing. 
- Illustration by Kevin Cornell
RSS readers: Don't forget to join the discussion!
One of the most vexing problems that still seems to be facing people when I talk to them about HTTP APIs is how to handle versioning and extensibility — i.e., how they evolve.
I tend to think about this a lot and talk to quite a few people about it, since I’m intimately familiar with the approaches to versioning that the HTTP protocol itself takes, and with the general attitude taken to it in the broader Internet architecture by the IETF.
So, I was quite interested to come across Tom Preston-Werner’s effort to define Semantic Versioning. If you haven’t seen it yet, go have a read; it’s a very sensible explanation of how to evolve software.
Let’s apply his “Firetruck” example to services. If you depend on a Ladder service, you have many of the same concerns; you need an instance of it that supports the semantics you understand (major version) and the additional features that you need on top (the minor version). You might also be interested in the patch level, in case you need to do some debugging.
The interesting, confusing and contentious part of applying this common sense to HTTP services is figuring out what you’re versioning, and how to communicate the version.
One viewpoint is to say that the service is being versioned, the service is identified by the URL, and therefore the version goes in the URL, like this:
http://api.example.com/v1.2.3/ladder
However, there are (at least) two issues to consider with this approach.
First of all, it’s coarse-grained, in that you can’t evolve parts of the system independently. For example, introducing a new format for the “ladder” resource - by the rules, that means a minor version, which means that the URL should now become:
http://api.example.com/v1.3.0/ladder
which, as discussed previously in the API Versioning Smackdown, creates a whole new tree of resources, and tightly couples the client and server.
While that may be fine if you have a very simple API with no interdependencies — such as serving a bunch of JavaScript, as Nicholas Zakas describes — it will quickly become a huge headache for testing, support and operations if you have to support all of the combinations of possible resources and their interactions in a more complex one.
Second, it’s also intermingling the version into identifiers. Because URIs are used in the Web as the fundamental identifier by so many things — caches, spiders, forms, and so on — embedding information that’s likely to change into them make them unstable, thereby reducing their value. For example, a cache invalidates content associated with a URL when a POST request flows through it; if the URL is different because there are different versions floating around, it doesn’t work.
This is actually very similar to the discussion about X- and names for HTTP headers; putting a flag into the name to signify “experimental” doesn’t make much sense when the experiment ends and it gets used “for real.”
Suggested Practices
With that in mind, what should HTTP APIs do? My current thinking (based on the thinking of a lot of other people ;) is below.
Keep Compatible Changes Out of Names
As per above, names should be stable over time, and should identify a known set of semantics — corresponding to the major version number in Semantic Versioning. By “names,” I mean everything that’s used as an identifier, whether it’s a URL, a media type, a link relation name, HTTP header, whatever.
From what I see, most HTTP APIs are already moving in this direction, with structures like this:
http://api.example.com/v2/ladder
Here, only the major version number is put in the URL; the minor and patch versions don’t go in, because backwards-compatible changes don’t need to be signified by changes in the name. Of course, it’d be equally valid to do:
http://api2.example.com/ladder
because the hostname is just as much an identifier as anything else.
Even with this approach, it’s worth noting that letting clients infer that it’s a v2 API just from that path segment is a dodgy thing to do; however, the deeper reasons for this are the subject of a different blog post.
Likewise, minor and patch versions shouldn’t go into other names, such as media types or link relations. There is a train of thought that you don’t need to have numeric major versions at all, since you can call the first one “foo” and the second one “bar” — but that’s just a matter of taste.
Avoid New Major Versions
This is also pretty widely agreed to. Every time you release a new major version, people have to look at it, understand it, write new software to it, debug it, and so on. This is a huge investment on both sides, since you also have to support two (or more) major versions concurrently for some sort of sunset period.
So, new major versions should be few and far between. In a perfect world, there would be none, but the reality is that every once in a while, you need to clean up a messy API or otherwise make breaking changes. Just make them last as long as you can.
Make Changes Backwards-Compatible
The biggest way to avoid new major versions is to make as many of your changes backwards-compatible as possible.
For example, if you want to add support for a new HTTP method, or add a new resource to the mix, this doesn’t necessitate a new version. Likewise, adding support for a new format can be achieved through the miracle of content negotiation. Need to change the meaning of an existing query argument? Don’t — instead, introduce a new one.
Surprisingly, removing support for something can be considered backwards-compatible too. Think about it; if you remove support for, say, the “foo” resource and return a 410 Gone, clients will break. However, it will only break those clients that use it; those that don’t can still smoothly interoperate. Introducing a new major version to say “I don’t support the foo resource” effectively breaks everybody, so it doesn’t do any good.
That naturally leads to…
Think about Forwards-Compatibility
Fundamentally, evolution is about figuring out how to limit the breakage that your changes incurs on clients. As such, you need to place some sorts of expectations and boundaries on how your clients should behave when they encounter certain circumstances.
In other words, if a client is hardcoded to only work when “foo” is there, “foo” will always have to be there to avoid breaking it. More subtly, if clients don’t expect an extra HTTP header, or an extra member on a JSON object, that can cause problems too, and restrict your options down the road.
Expressing these expectations clearly is something we as an industry needs to get a lot better about. Unlike Web browsers — which are extremely forgiving of unexpected input — most API clients are incredibly brittle.
For example, XML Schema got this pretty fundamentally wrong, making it very difficult to express a forwards-compatible schema (in 1.0). JSON is better, mostly because implementations are tied pretty closely to dynamic programming language data structures, rather than a schema language.
That’s not to say that you want everything to be extensible. Sometimes it’s a good idea to explicitly NOT allow forward compatibility. For example, in the JSON Patch format, we realised that any extensions were likely to be a fundamental change in the document semantics, thus requiring a new major version (in this case, identifying it with a new media type). After all, an old client that doesn’t respect a new patch operation is going to come up with a different result than the new one that does understand it. So, we disallowed most extensions in that format.
The trick is to think about it and document your expectations carefully — whether you’re designing a format, a link relation type, or a HTTP header. Tell clients to expect and handle (as gracefully as they can) responses like 410 Gone, 405 Method Not Allowed, and 415 Unsupported Media Type.
Version at Appropriate Granularities
Going back to the example above:
http://api.example.com/v2/ladder
a natural question to ask is “what is that ‘v2’ versioning?”
To me, the most natural reading is that it’s a collection of resources that represents a set of functionality, hierarchical URLs have collection semantics, by putting them under a single path segment (“directory”).
This is an important distinction; v2 is identifying NOT just the ladder, but the whole interface, as a collection of resources (i.e., everything under that “v2”) that works together.
Another conversation that I have sometimes is about how to relate format changes to API versions. In short, they should be completely separate; formats can have lives of their own, and to get the most value out of them, they should do so. It’s fine to say “Version 2 of the API requires the foo resource to support version 5 of the bar format,” of course.
Make Minor and Patch Information Discoverable
There are legitimate needs for minor and patch information; just because we say “don’t put them into names” doesn’t mean that they shouldn’t exist. However, I am fairly skeptical that they do much good “on the wire” as they are.
For example, consider our Ladder service, version 1.2.3. Maybe we added support for a new HTTP method in version 1.2, and fixed a few bugs in patch level 3.
A self-describing service will make it completely evident (e.g., using the Allow response header) that the new method is supported; if it needs to be known about ahead of time (for example, to reflect it in a UI), you can advertise support for that method directly (e.g., with something like this embryonic mechanism).
Tying this information up in a version number only makes the client go and look up a chart of version numbers to see whether the feature they want is supported by the given version; instead, if they can directly interrogate the interface to see if it supports the fancy new “ladder cover” feature (or whatever), it’s a lot more flexible and useful. The same goes for new resources, new formats, and so on.
Aside from that, using linear, numeric minor versions for negotiating new features is really, really limiting; complex APIs will find this especially impractical.
So, where should the minor and patch version numbers go? Easy — it’s useful for the release notes, and a few other forms of documentation. It’s useful for marketing. In the case of a more complex API (as with most standards — whether they come out of a standards body or an open source project), it’s useful for packaging up an agreed-to set of functionality and calling that a spec release.
Mind you, there’s a strong case for including this information about the implementation of the API — server or client side — but that goes in the Server or User-Agent header respectively, and is completely separate from API versioning (i.e., you might have version 0.2.1 of the client accessing version 3.2.3 of the server’s implementation of the API, which itself might have a version of 1.0.3).
Geri Coady extends goodwill to all with some insights about colour and how it impacts everyone using our sites and apps. Full of practical tips and tools, this gift keeps on giving.
As Percona Live London is raging in the UK, I thought it fitting to remind everyone about the next big Percona Live: MySQL Conference and Expo 2013 in Santa Clara, Californa on April 22-25, 2013. You can register NOW for this conference, and the Super Saving Registration deadline ends on December 28th, so be sure to register early if you already know you’ll be there.
As promised, I wanted to keep giving updates on the committee’s progress in selecting the program for the conference. As many people know, the selections and schedule for tutorial day have been made public. There are many great tutorials being presented, and I don’t have time to laud them all, so let me give you a few (highly biased) highlights:
Operational DBA In A Nutshell! Hands On Tutorial!
This tutorial will be an extremely hands-on overview of all things pretaining to being a MySQL DBA and would be an excellent choice for anyone either just getting started in MySQL, or who has been doing DBA work on the side and wants to dig further. Also, it is being given by my excellent Belgian Percona colleagues: Liz, Fred, and Kenny: who are fantastic presenters and trainers.
This is one of 3 tutorials that will span the full 6 hour day and I just couldn’t see covering all this material in less time.
Cookbook for Creating INDEXes — All about Indexing
Now for one of my former colleages: Rick James of Yahoo! Inc. Rick is giving a 3 hour tutorial focusing on indexing within MyISAM and Innodb with a side of Partitioning. I’d expect folks interested in efficient schema design for high performance to get a lot of out of Rick’s talk.
A good afternoon follow up to Rick’s talk might be:
Advanced query optimizer tuning and analysis
This 3 hour session is being given by Timour Katchaounov and Sergei Petrunia from the Monty Program and focuses in on how the query optimizer does what it does, why it does it, and how to make it work for you. These guys are probably a few of just a handful of developers in the world who have really dug into the optimizer and have a solid understanding of exactly how it works.
These two talks would be an excellent track for anyone who does schema design and writes a lot of queries.
High Availability Tutorials
One of the clearly emerging themes this year is High Availability, though honestly I’m not sure when this was not the case at a MySQL conference. There are several MySQL HA related tutorials:
- Using Tungsten Replicator to solve replication problems (3 hr)
- Ramp-up tutorial for MySQL Cluster – Scaling with continuous availability (6 hr)
- Percona XtraDB Cluster / Galera in Practice (6 hr)
So what’s next? The committee is furiously working on rating the huge number of session proposals. I just finished a multi-week process of rating each and every talk (except my own) just today. Unlike the tutorial selection process where there was a few dozen proposals, there are hundreds of session proposals to sort through. Everyone on the committee has his or her own opinion about what’s most important for talks, and it’s through our combined effort that we (hopefully) end up with a schedule of talks that is well rounded and represents where the community is today and where it is headed in the future (and, of course, encourages as many people as possible to attend!).
As always, feel free to ask questions. I’d be happy to answer what I can!
I've been seeing this over the past few years, imagine this scenario:
You have a stored procedure that runs well most of the time but sometimes it's WAYYYYY off. It's almost as though the performance of it went from great to horrible in a split second (like falling off of a cliff). You don't know why but someone says - it's got to be the statistics. In fact, if you have the luxury of time (which most folks don't have), you execute it yourself and you check the plan - WOW, the estimated number of rows is WAY off from the actual rows. OK, it's confirmed (you think); it's statistics.
But, maybe it's not...
See, a stored procedure, a parameterized statement executed with sp_executesql and prepared statements submitted by clients ALL reuse cached plans. These plans were defined using something called parameter sniffing. Parameter sniffing is not a problem itself - but, it can become a problem for later executions of that same statement/procedure. If a plan for one of these statements was created for parameters that only return 1 row then the plan might be simple and straightforward - use a nonclustered index and then do a bookmark lookup (that's about as simple as it can get). But, if that same sp_execute statement/procedure/prepared statement runs again later with a parameter that returns thousands of rows then using the same plan created by sniffing that earlier parameter then it might notbe good. And, this might be a rare execution. OR, it could be even more strange. These plans are not stored on disk; they are not permanent objects. They are created any time there is not already a plan in the cache. So, there are a variety of reasons why these plans can fall out of cache. And, if it just so happens that an atypical set of parameters are the first ones used after the plan has fallen out of cache (better described as "has been invalidated") then a very poor plan could end up in cache and cause subsequent executions of typical parameters to be way off. Again, if you look at the actual plan you'll probably see that the estimate is WAY off from the actual. But, it's NOT likely to be a statistics problem.
But, let's say that you think it is a statistics problem. What do you do?
You UPDATE STATISTICS tablename or you UPDATE STATISTICS tablename indexname (for an index that you specifically suspect to be out of date)
And, then you execute the procedure again and yep, it runs correctly this time. So, you think, yes, it must have been the statistics!
However, what you may have seen is a side-effect of having updated statistics. When you update statistics, SQL Server usually* does plan invalidation. Therefore, the plan that was in cache was invalidated. When you executed again, you got a new plan. This new plan used parameter sniffing to see the parameters you used and then it came up with a more appropriate plan. So, it probably wasn't the statistics - it was the plan all along.
So, what can you do?
First, do not use update statistics as your first response. If you have a procedure that's causing you grief you should consider recompiling it to see if you can get a better plan. How? You want to use sp_recompile procedurename. This will cause any plan in cache to be invalidated. This is a quick and simple operation. And, it will tell you whether or not you have a recompilation problem (and not a statistics problem). If you get a good plan then what you know is that your stored procedure might need some "tweaking" to its code. I've outlined a few things that you can use to help you here: Stored procedures, recompilation and .NetRocks. If that doesn't work, then you MIGHT need to update statistics. What you should really do first though is make sure that the compiled value of the code IS the same as the execution value of the code. If you use "show actual plan" you can see this by checking the properties window (F4) and hovering over the output/select.

This will confirm that the execution did (or did not) use those values to compile the plan. If they were the correct values then you might have a statistics problem. But, it's often blamed and it's not actually the problem. It's the plan.
OK, there's a bit more to this...
*Do plans ALWAYS get invalidated when you update statistics? No...
Erin Stellato (blog |twitter) first blogged about this here:Statistics and Recompilation. And, also here:Statistics and Recompilation, Part II.
Here's a quick summary though because it looks like things have changed again in SQL Server 2012...
This set of notes covers the main things that I think you need to know about working with git. It is not comprehensive and mainly serves as a reminder for myself.
Remotes
In a typical open source workflow using GitHub or BitBucket, you would fork the main repository into your own and then clone that copy to your local computer:
git clone git@github.com:akrabat/joind.in.git
You then need to connect your local repository to the main repository. By convention, the main repository is known as upstream:
cd joind.in
git remote add upstream git://github.com/joindin/joind.in.git
To sync your local repository with upstream and update your copy back on BitBucket/GitHub:
git checkout master
git fetch upstream
git merge --ff-only upstream/master
git push origin
If the merge fails, then something bad has happened and you'll need Google! One option is to force your master to match upstream using git reset --hard upstream/master and then git push --force origin.
Branching
The main branch in the repository is called master. Never ever code directly on master. Always create a branch and code on that and then merge back to master when complete.
(Note that the push command is also used throughout this document to sync your local repository with your remote on BitBucket/GitHub. If don't want to publish, then don't push.)
Create a branch:
git checkout -b my-branch-name
git push origin my-branch-name
List all branches:
git branch -v
To list all remote branches too use the -a switch:
git branch -v -a
Change from one branch to another:
git checkout another-branch-name
Delete a branch:
git branch -D my-branch-name
git push origin :my-branch-name
Rebase master onto a branch:
If lots of people are working on the project, then master has probably changed a lot since you started. It's going to make your merge back to master much easier if you update your branch so that all your changes appear to be after all the changes on master. This is known as rebasing:
git fetch upstream
git checkout my-branch-name
git rebase -f upstream/master
git push -f origin my-branch
Bring in a remote branch to your local repository:
If you want to work with a branch that's on a remote repository, then you need to create your own tracking branch:
git fetch upstream
git checkout master
git branch -t upstream/remote-branch-name
git push origin remote-branch-name
The name of your local branch will match the remote branch name.
Sync remote branch with local one
git fetch upstream
git checkout remote-branch-name
git merge upstream/remote-branch-name
git push origin
Committing
To commit a change:
To commit a change, you first need to stage the files that you want to commit to the index:
git add filename
You can then commit:
git commit -m "my commit message"
git push origin
Note that if you change a filename after adding to the index, then the change will not be committed unless your git add the file again. For information about a good commit message, read A Note About Git Commit Messages by Tim Pope.
Merging
Merge a branch into master:
git checkout master
git fetch upstream && git merge --ff-only upstream/master
git merge --no-ff my-branch-name
Note that you don't need to commit anything. You do need push though:
git push
If there were conflicts, then you need to resolve them by editing the files appropriately (look for <<<<<<<). At this point, you do need commit your changes:
git add .
git commit
git push
Back out a conflicted merge:
git reset --hard HEAD
Working with your repository
To find out what's happened:
git reflog -10
This provides a list of the last 10 things that you've done on this repository across all branches.
git log --oneline -10
This provides a list of the last 10 commits that's happened on this branch
In both cases, the first column contains the commit hash reference that uniquely identifies each commit.
To find out current commit:
git log -1
To undo all working changes
git checkout .
To revert a commit:
Find the commit that you want to go back to via log or reflog:
git reset --hard abcdef0
You can also use the 'HEAD' number from reflog:
git reset --hard HEAD@{1}
To amend last commit message:
git commit --amend -m "New commit message"
It's best to do this before pushing. If you have pushed, then you need to force push using git push --force and expect to get lots of hassle from your co-workers.
Differences
To find out the differences between your edits and the last commit:
All files:
git diff
One file:
git diff -- my-filename
To find out the differences between current branch and master:
git diff master..HEAD my-filename
To find out the differences between local master and origin's master:
git diff HEAD...origin/master
Some git aliases
Aliases provide a way to create new git commands and are usually used for creating shortcuts:
Type these from the command line:
$ git config --global alias.st status
$ git config --global alias.staged 'diff --staged'
$ git config --global alias.unstage 'reset HEAD --'
$ git config --global alias.last 'log -1 HEAD'
This gives you some new git commands:
- git st => view current status
- git staged => view the diff of what is currently staged
- git unstage filename => Remove filename from the staging area
- git last => view last commit
Useful links
Several years ago, some suspected cyber criminals on the Internet wrote a family of malware dubbed DNSChanger. About a year ago, law enforcement tracked down the suspected cyber criminals behind this malware, arrested them, and took over the servers they were using to redirect customers to rogue sites.
As a result of a court order, the Internet Systems Consortium (ISC) under the direction of the FBI, has continued to run the DNS servers used by the malware for the last year. However, the court order will soon expire and those servers are scheduled to be shut down on July 9, 2012. When that happens, hundreds of thousands of Internet users whose systems are still infected and/or affected could lose access to the web, email, and anything else that depends on DNS. This is the story of how two Internet infrastructure startups — CloudFlare and OpenDNS — are playing a small part to help solve the problem.
A Bit of DNS Background
Up front, in order to understand this story, you need to understand there are two types of DNS servers: recursive and authoritative. Everyone who uses the Internet needs a recursive DNS server. Your ISP usually provides these types of services or you can use a provider like OpenDNS, Google, DNSAdvantage, other public resolvers, or even run a server yourself to handle your recursive DNS queries.
On the other hand, every domain needs at least one authoritative DNS server. Authoritative servers are where a particular domain's records are hosted and published. Many domain registrars provide authoritative DNS servers, or you can use a service like CloudFlare and we provide authoritative DNS. When an Internet user types a Universal Resource Identifier (URI) aka Universal Resource Locator (URL) into their browser, clicks on a link, or sends an email, their computer queries their recursive DNS provider. If the recursive DNS provider has the answer cached then it responds. If it doesn't have the answer cached, or if the answer it has is stale, then the recursive DNS server queries the authoritative DNS server.
As mentioned above, OpenDNS provides recursive DNS. Their customers are web surfers and they provide a terrific service that helps speed up Internet browsing and protect people on the web from malware. CloudFlare provides authoritative DNS. Our customers are websites and we make those sites faster and protect sites from attacks directed at them. While we're often asked if OpenDNS and CloudFlare are competitive, in reality both services are complementary just using different parts of DNS (recursive and authoritative) to achieve a similar mission: a faster, safer, better Internet.
How Suspected Cyber Criminals Use DNS to Do Bad Things
The DNSChanger malware family was designed to change the recursive DNS server that Internet users’ computers queries. Instead of directing DNS queries at the recursive server you or your ISP configured, the malware modified computer settings to route queries to recursive DNS servers controlled by the suspected cyber criminals.
The job of DNS is to translate a domain name such as dcwg.org, which humans prefer, into an IP address, like 108.162.205.64, which servers and routers can use. If you are a cyber criminal and you can gain control over someone’s recursive DNS then you can direct traffic to certain sites to a fake version of the site. Once DNSChanger had web surfers querying rogue recursive DNS servers, all requests for legitimate websites could be directed to a fake website. For example, even if you typed your bank's domain name into your browser, if the suspected cyber criminals control recursive DNS then they can send you to a malicious site and steal your information.
Over the years DNSChanger operated unchecked, more than a million computers and home routers had their DNS configurations modified. Thankfully, law enforcement was able to track down the suspected cyber criminals behind the malware, arrest them, and seize control of the rogue recursive DNS servers. Unfortunately, hundreds of thousands of computers are still using the formerly rogue recursive DNS servers. On July 9, 2012 the court order directing ISC to operate the servers expires and those servers are scheduled to be shut down. On that date, all systems which still have their DNS settings modified by DNSChanger will effectively be cut off from the Internet.
Getting the Word Out
The DNSChanger Working Group (DCWG), a loosely affiliated organization comprised of some of the world’s largest and most competent ISPs, search engines vendors, software vendors, security companies, and others, has been working to get the word out about the problem and reduce the impact of the shutdown of the DNSChanger recursive servers. The DCWG launched a website (dcwg.org) to provide information about the malware, let people test whether they are infected, and provide recommendations on how to fix their systems. CloudFlare first became involved when the folks at dcwg.org reached out to us because their site was under heavy load after attention from major media outlets. CloudFlare helped keep the dcwg.org website online under the load caused by media attention over the last 10 days. We offloaded more than 95% of the traffic to the site, ensuring the site ran fast and stable even when it was being featured on the front page of cnn.com.
Unfortunately, one of the challenges in trying to address situations like DNSChanger is that you only know to go to the dcwg.org website if you already know about it. What you needed was something akin to an emergency broadcast system that would inform people who were infected that they had a problem as they surfed the web. In the process of working with the DCWG, we realized we might be able to help.
Some of our engineers created an app named Visitor DNSChanger Detector App. Any website on CloudFlare can enable the app with a single click from our apps marketplace. The app installs a small bit of Javascript on the page that tests visitors to see if they're infected. If the tests do not detect anything, nothing happens. If the tests indicate that the DNSChanger recursive servers are being used, then a banner is displayed across the top of the page and visitors are directed to instructions on how to clean up the infection (more on that in a second).
More than 470 million people pass through CloudFlare's network on a monthly basis. Our data suggest that more than half of the people infected with DNSChanger would visit at least one site on CloudFlare per month. The power of the Visitor DNSChanger Detector App is that as CloudFlare publishers enable it then there is an increasing likelihood that people who are infected will get information about their infection before they are no longer able to use the Internet on July 9, 2012.
While we've made it extremely easy for publishers on CloudFlare's network to help get the word out, we didn't want to restrict participation to only those sites using our service. We therefore decided to release the code for the checks publicly and as open source so anyone who can install a few lines of Javascript on their web pages will be able to install it on their own sites to inform their potentially infected users. You can access the code from the following GitHub Repo. We're hopeful that sites both large and small will take the time to install the code in order to help inform their visitors who may be infected.
What Should People Notified of This Infection Do?
While CloudFlare is able to assist with informing web surfers they have an infection, we aren’t particularly well situated to actually fix the problem. After all, it isn’t our customers that are directly impacted, but rather the customers of our customers. Many of the folks infected can get help from their ISPs, but for some this might not be an option. CloudFlare reached out to David Ulevitch, the CEO of OpenDNS and he saw this as a great opportunity to further OpenDNS's mission of helping build a better Internet. We added OpenDNS as a resource for publishers to display to their customers when the Javascript detects the use of the DNSChanger recursive servers.
The Power of the DNS
This incident illustrates to me the importance and power of the DNS system that underpins the Internet. The suspected cyber criminals were able to modify DNS settings to steal advertising revenue and perform other illegal activities. CloudFlare uses authoritative DNS in order to provision powerful tools to make sites faster and even help create a sort of emergency warning system for the Internet. OpenDNS provides high performance recursive DNS caching services for their customers. Combined, we hope to help the DCWG get the word out so the hundreds of thousands of Internet users still impacted by the DNSChanger malware will be able to take steps to ensure they’ll be able to use the Internet on July 10, 2012 and beyond.
Today Amazon Web Services launched AWS Marketplace, an online store that makes it easy for you to find, buy, and immediately start using software and services that run on the AWS Cloud. You can use AWS Marketplace’s 1-Click deployment to quickly launch pre-configured software on your own Amazon EC2 instances and pay only for what you use, by the hour or month. AWS handles billing and payments, and software charges appear on your AWS bill.

Marketplace has software listings from well-known vendors including 10gen, CA, Canonical, Couchbase, Check Point Software, IBM, Microsoft, SAP, Zend, and others, as well as many widely used open source offerings including Wordpress, Drupal, and MediaWiki.
AWS Marketplace brings the same simple and trusted online shopping experience that customers enjoy on Amazon.com to software built for the AWS platform, streamlining the process of doing research and purchasing software. It features a wide selection of development and business software, including software infrastructure, developer tools, and business applications. Product prices are clearly stated and appear on the same bill as your other AWS services.
AWS Marketplace also simplifies many of the challenges software companies face, such as acquiring customers, developing distribution channels, and billing for their software.
Why shop here?
The way businesses are buying applications is changing. There is a new generation of leaders that have very different expectations about how they can select the products and tools they need to be successful. Last week I met with a CIO for a discussion about how her IT department can use AWS to help make their business units be more agile and move faster. One of the stumbling blocks she mentioned was how to select the best software running on AWS, in a way that was completely in line with the “Cloud Experience”: no software to install, no sales cycle, no procurement delays, and a selection of licensing models to choose from. She jokingly asked for an “Amazon 1-Click” experience for software. I am sure she will be a very happy CIO today.
AWS Marketplace features a wide selection of commercial and free IT and business software. AWS Marketplace enables you to compare options, read reviews, and quickly find the software you want.
We wanted to shrink the time between finding what you want and getting it up and running. Once you find software you like, you can deploy that software to your own EC2 instance with 1-Click -- like the CIO suggested -- or using popular management tools like the AWS Console.
In addition, for most products, software prices are clearly posted on the website so you can purchase software immediately, with the payment instrument you already have on file with Amazon Web Services. Software charges appear on the same monthly bill as your AWS infrastructure charges.
Why sell here?
The Amazon Web Services have helped to create great ecosystem of ISVs that are selling software and services to other customers running in the cloud. It has had a true democratization effect: no longer does the dominant vendor in a market automatically get chosen. I have many IT decision makers ask me who are the young and exciting companies they should be paying attention to. Who are the companies that have a native cloud product, who are the ones that have innovative new licensing models, who are the young and hungry companies that break with the old style of enterprise software vending and are truly customer-centric. At the same time the up-and-coming companies often ask me how we can help them get in front of more customers such that they can compete in an open and honest way. And they also often ask whether we can help them with what Amazon.com does so well for its sellers: handle billing and charging.
AWS Marketplace includes both large, well known companies as well as exciting up and coming companies. If you’re a software provider with an offering that runs on the AWS cloud, you can gain new customers, enable usage-based billing without much additional work, and ensure that customers have a fast and easy deployment experience with their software.
AWS Marketplace helps software and SaaS providers find new customers by exposing their products to some of the hundreds of thousands of AWS customers, ranging from individual software developers to large enterprises.
Additionally, if you are interested in adding hourly billing to your software, AWS Marketplace can help. Simply upload an Amazon Machine Image to AWS and provide the hourly cost. Billing is managed by AWS Marketplace, relieving sellers of the responsibility of metering usage, managing customer accounts, and processing payments, leaving software developers more time to focus on building great software.
Summary
At Amazon we have a long experience with buyers and sellers in a marketplace. We know that something great happens when you solve problems for both the people selling things and those buying things – the market becomes more and more vibrant. We know that for buyers it is important to have very convenient ways of discovering and buying products. For sellers it is important to get their products in front of as many relevant customers as possible and make the sales process as painless as possible.
But more important that anything else for both parties is trust: easy to understand product information, high quality, relevant reviews by other customers, that the seller is reputable and has a history of delivery, and that the buyer will only be charged for his exact usage. For the seller it removes the burden of having to manage customers, measuring their usage and collecting payments for it.
The AWS Marketplace is a great step forward in making easier to buy and deploy software. It also makes it dead simple for ISVs for add hourly billing to their offerings and get their software in from of hundreds of thousands of active AWS customers.
For more information see the announcement at the AWS website, the "Introducing AWS Marketplace" video, the posting on the AWS blog and off course visit the AWS Marketplace for a test drive. Happy shopping!
Over the past several years I’ve spent much of my time traveling around the world speaking about distributed systems. From building infinitely scalable data stores, architectures for high performance computing, to the challenges imposed by the CAP theorem, there are wonderful, complex, fascinating problems to be solved in the area of distributed computing. During my travels I’ve met thousands of brilliant engineers who are leveraging the cloud to deliver exciting new products and revolutionize IT as we know it. One thing that’s become obvious to me is that there are innovative, inspiring developers in every corner of the planet from Australia to Iceland and from Israel to Peru.
And that leads me to another distributed problem – finding good engineers to help AWS build the next generation of cloud computing services. We’ve got a big vision and to realize it we need to find qualified engineers to join us on our journey. A quick look at the AWS career web sites reveals that we are hiring hundreds of people around the world.
Click here for our current job openings in the U.S.
Click here for our current job openings in Europe, Asia, and South Africa
Distributed problems call for innovative solutions. So next month we will be taking a distributed approach to finding engineers who want to join AWS. On May 17th and 18th we will be traveling to Houston, Minneapolis, and Nashville to interview candidates who want to join the AWS team. If you live in or near one of those cities and are interested in a meeting with us about careers in AWS check out this page. You can also simply email your resume to aws-recruiting@amazon.com
Today Amazon Web Services is introducing Amazon CloudSearch, a new web service that brings the power of the Amazon.com’s search technology to every developer. Amazon CloudSearch provides a fully-featured search engine that is easy to manage and scale. It offers full-text search with features like faceting and user-defined rank functions. And like most AWS services, Amazon CloudSearch scales automatically as your data and traffic grow, making it an easy choice for applications small to large. With Amazon CloudSearch, developers just create a Search Domain, upload data, and start querying.
Why Search?
Search is an essential part of many of today's cloud-centric applications. While in our daily lives we are mostly familiar with the search functionality offered by web search, there are in fact many more cases where search is a fundamental component of an application. Search is a much broader technology than just the indexing of large collections of web pages. Many organizations have large collections of documents, structured and unstructured, that can benefit from a specialized search service. With the rise of the App developer culture there is an increasing number of consumer data sources that cannot be simply queried with a web search engine. Using specialized ranking functions these apps can give their customers a highly specialized search experience.
And increasingly, search is applied to data that, though called a "document" for the purposes of search, is really just a record in a database or an object in a NoSQL system. On the query side, we are used to seeing search results as users, but search results are increasingly being used at the core of complex distributed systems where the results are consumed by machines, not people.
With these applications in mind, our customers have told us that a cloud-based managed search service is high on their wish lists. Their main motivation is that existing search technologies, both commercial and open source, have proven to be hard to manage and complex to configure.
Amazon CloudSearch will have democratization effect as it offers features that have been out of reach for many customers. With Amazon CloudSearch, a powerful search engine is now in the hands of every developer, at our familiar low prices, using a pay-as-you-go model. It will allow developers to improve functionality of their products, at lower costs with almost zero administration. It is very simple to get started; customers can create a Search Domain, upload their documents, and can immediately start querying.
How it Works
Developers set up a Search Domain -- a set of resources in AWS that will serve as the home for one collection of data. Developers then access their domain through two HTTP-based endpoints: a document upload endpoint and a query endpoint. As developers send documents to the upload endpoint they are quickly incorporated into the searchable index and become searchable.
Developers can upload data either through the AWS console, from the command-line tools, or by sending their own HTTP POST requests to the upload endpoint.
There are three features that make it easy to configure and customize the search results to meet exactly the needs of the application.
Filtering: Conceptually, this is using a match in a document field to restrict the match set. For example, if documents have a "color" field, you can filter the matches for the color "red".
Ranking: Search has at least two major phases: matching and ranking. The query specifies which documents match, generating a match set. After that, scores are computed (or direct sort criterion is applied) for each of the matching documents to rank them best to worst. Amazon CloudSearch provides the ability to have customized ranking functions to fine tune the search results.
Faceting: Faceting allows you to categorize your search results into refinements on which the user can further search. For example, a user might search for ‘umbrellas’, and facets allow you to group the results by price, such as $0-$10, $10-$20, $20-$40, etc. Amazon CloudSearch also allows for result counts to be included in facets, so that each refinement has a count of the number of documents in that group. The example could then be: $0-$10 (4 items), $10-$20 (123 items), $20-$40 (57 items), etc.
For more information on the different configuration possibilities visit the Amazon CloudSearch detail page.
Automatic Scaling
Amazon CloudSearch is itself built on AWS, which enables it to handle scale.

Amazon CloudSearch supports both horizontal and vertical scaling. The main search index is kept in memory to ensure that requests can be served at very high rates. As developers add data, CloudSearch increases either the size of your underlying node or it increases the number of nodes in the cluster. To handle growing request rates, the service autoscales the number of instances handling queries.
Amazon CloudSearch is based on more than a decade of developing high quality search technologies for Amazon.com. It has been developed by A9, the Amazon.com subsidiary that focuses on search technologies. The technology that is used at all the different places where you can search on Amazon.com is also at the core of at Amazon CloudSearch.
Summary
With the launch of Amazon CloudSearch the Amazon Web Services remove yet another pain point for developers. Almost every application these days needs some form of search and as such every developer has to spend significant time implementing it. With Amazon CloudSearch developers can now simply focus on their application and leave the management of search to the cloud.
For more information see the Amazon CloudSearch detail pages, the Amazon CloudSearch Developer Guide and the posting on the AWS developer blog.
You can sign up for the Introduction To Amazon CloudSearch webinar on May 10.
In yesterday’s blog post, Making the HTTP Archive faster, one of the biggest speedups came from not using a script loader. It turns out that script loader was using document.write to load scripts dynamically. I wrote about the document.write technique in Loading Script Without Blocking back in April 2009, as well as in Even Faster Web Sites (chapter 4). It looks something like this:
document.write('<script src="http://www.stevesouders.com/blog' + src + '" type="text/javascript"></script>'):
The problem with document.write for script loading is:
- Every DOM element below the inserted script is blocked from rendering until the script is done downloading (example).
- It blocks other dynamic scripts (example). One exception is if multiple scripts are inserted using
document.writewithin the same SCRIPT block (example).
Because the script loader was using document.write, the page I was optimizing rendered late and other async scripts in the page took longer to download. I removed the script loader and instead wrote my own code to load the script asynchronously following the createElement-insertBefore pattern popularized by the Google Analytics async snippet:
var sNew = document.createElement("script");
sNew.async = true;
sNew.src="http://ajax.googleapis.com/ajax/libs/jquery/1.5.1/jquery.min.js";
var s0 = document.getElementsByTagName('script')[0];
s0.parentNode.insertBefore(sNew, s0);
Why does using document.write to dynamically insert scripts produce these bad performance effects?
It’s really not surprising if we walk through it step-by-step: We know that loading scripts using normal SCRIPT SRC= markup blocks rendering for all subsequent DOM elements. And we know that document.write is evaluated immediately before script execution releases control and the page resumes being parsed. Therefore, the document.write technique inserts a script using normal SCRIPT SRC= which blocks the rest of the page from rendering.
On the other hand, scripts inserted using the createElement-insertBefore technique do not block rendering. In fact, if document.write generated a createElement-insertBefore snippet then rendering would also not be blocked.
At the bottom of my Loading Script Without Blocking blog post is a decision tree to help developers choose which async technique to use under different scenarios. If you look closely you’ll notice that document.write is never recommended. A lot of things change on the Web, but that advice was true in 2009 and is still true today.
This week I finally got time to do some coding on the HTTP Archive. Coincidentally (ironically?) I needed to focus on performance. Hah! This turned out to be a good story with a few takeaways – info about the HTTP Archive, some MySQL optimizations, and a lesson learned about dynamic script loaders.
Setting the stage
The HTTP Archive started in November 2010 by analyzing 10K URLs and storing their information (subresource URLs, HTTP headers, sizes, etc.) in a MySQL database. We do these runs twice each month. In November 2011 we began increasing the number of URLs to 25K, 50K, 75K, and finally hit 100K this month. Our goal is to hit 1M URLs by the end of 2012.
The MySQL schema in use today is by-and-large the same one I wrote in a few hours back in November 2010. I didn’t spend much time on it – I’ve created numerous databases like this and was able to quickly get something that got the job done and was fast. I knew it wouldn’t scale as the size of the archive and number of URLs grew, but I left that for another day.
That day had arrived.
DB schema
The website was feeling slow. I figured I had reached that curve in the hockey stick where my year-old schema that worked on two orders of magnitude less data was showing its warts. I saw plenty of slow queries in the log. I occasionally did some profiling and was easily able to identify queries that took 500 ms or more; some even took 10+ seconds. I’ve built big databases before and had some tricks up my sleeve so I sat down today to pinpoint the long poles in the tent and cut them down.
The first was pretty simple. The urls table has over 1M URLs. The only index was based on the URL string – a blob. It took 500-1000 ms to do a lookup. The main place this happens is looking up the URL’s rank, for example, in the last crawl Whole Foods was ranked 5,872 (according to Alexa). This is a fairly non-critical piece of information, so slowing down the page 500-1000 ms wasn’t acceptable. Plus this seems like a simple lookup ripe for optimizing.
When I described this problem to my Velocity co-chair, John Allspaw, he suggested creating a hash for the URL that would be faster to index. I understood the concept but had never done this before. I didn’t find any obvious pointers out there on “the Web” so I rolled my own. I started with md5(), but that produced a fairly long string that was alphanumeric (hex):
select md5("http://www.wholefoodsmarket.com/");
=> 0a0936fe5c690a3b468a6895efaaff83
I didn’t think it would be that much faster to index off the md5() hex string (although I didn’t test this). Assuming that md5() strings are evenly distributed, I settled on taking a substring:
select substring(md5("http://www.wholefoodsmarket.com/"), 1, 4);
=> 0a09
This was still hex and I thought an int would be a faster index (but again, I didn’t test this). So I added a call to conv() to convert the hex to an int:
select conv(substring(md5("http://www.wholefoodsmarket.com/"), 1, 4), 16, 10);
=> 2569
I was pretty happy. This maps URLs across 64K hashes. I’m assuming they’re evenly distributed. This conversion is only done a few times per page so the overhead is low. If you have a better solution please comment below, but overall I thought this would work – and it did! Those 500+ ms queries went down to < 1 ms. Yay!
But the page was still slow. Darn!
Duh – it’s the frontend
This and a few other MySQL changes shaved a good 2-3 seconds of the page load time but the page still felt slow. The biggest problem was rendering – I could tell the page arrived quickly but something was blocking the rendering. This is more familiar performance territory for me so I gleefully rolled up my sleeves and pulled out my WPO toolbox.
The page being optimized is viewsite.php. I used WebPagetest to capture a waterfall chart and screenshots for Chrome 18, Firefox 11, IE 8, and IE 9. The blocking behavior and rendering times were not what I consider high performance. (Click on the waterfall chart to go to the detailed WebPagetest results.)
These waterfall charts looked really wrong to me. The start render times (green vertical line) were all too high: Chrome 1.2 seconds, Firefox 2.6 seconds, IE8 1.6 seconds, and IE9 2.4 seconds. Also, too many resources were downloading and potentially blocking start render. This page has a lot of content, but most of the scripts are loaded asynchronously and so shouldn’t block rendering. Something was defeating that optimization.
Docwrite blocks
I immediately honed in on jquery.min.js because it was often in the critical path or appeared to push out the start render time. I saw in the code that it was being loaded using Google Libraries API. Here’s the code that was being used to load jquery.min.js:
<script src="http://www.google.com/jsapi"></script>
<script>
google.load("jquery", "1.5.1");
</script>
I’ve looked at (and built) numerous async script loaders and know there are a lot of details to get right, so I dug into the jsapi script to see what was happening. I saw the typical createElement-insertBefore pattern popularized by the Google Analytics async snippet. But upon walking through the code I discovered that jquery.min.js was being loaded by this line:
m.write('<script src="http://www.stevesouders.com/blog'+b+'" type="text/javascript"></script>'):
The jsapi script was using document.write to load jquery.min.js. While it’s true that document.write has some asynchronous benefits, it’s more limited than the createElement-insertBefore pattern. Serendipitously, I was just talking with someone a few weeks ago about deprecating the jsapi script because it introduces an extra HTTP request, and instead recommend that people just load the script directly. So that’s what I did.
We don’t need no stinkin’ script loader
In my case I knew that jquery.min.js could be loaded async, so I replaced the google.load code with this:
var sNew = document.createElement("script");
sNew.async = true;
sNew.src="http://ajax.googleapis.com/ajax/libs/jquery/1.5.1/jquery.min.js";
var s0 = document.getElementsByTagName('script')[0];
s0.parentNode.insertBefore(sNew, s0);
This made the start render times and waterfall charts look much better:
Chrome 18:
Firefox 11:
Internet Explorer 8:
Internet Explorer 9:
There was better parallelization of downloads and the start render times improved. Chrome went from 1.2 to 0.9 seconds. Firefox went from 2.6 to 1.3 seconds. IE8 went from 1.6 to 1.1 seconds. IE9 went from 2.4 to 1.0 seconds.
This was a fun day spent making the HTTP Archive faster. Even though I consider myself a seasoned veteran when it comes to web performance, I still found a handful of takeaways including some oldies that still ring true:
- Even for web pages that have significant backend delays, don’t forget to focus on the frontend. After all, that is the Performance Golden Rule.
- Be careful using script loaders. They have to handle diverse script loading scenarios across a large number of browsers. If you know what you want it might be better to just do it yourself.
- Be careful using JavaScript libraries. In this case
jquery.min.jsis only being used for the drop down About menu. That’s 84K (~30K compressed) of JavaScript for a fairly simple behavior.
If you’re curious about why document.write results in worse performance for dynamic script loading, I’ll dig into that in tomorrow’s blog post. Hasta mañana.
What a crazy few weeks. Since my crazy, fuck you to homophobia coming out post almost a month ago, I’ve seem to have unintentionally gone into full overdrive with the social issues in technology culture posts: discussing how sexism in technology conferences insults everybody by making women and gay men invisible, and by portraying straight men as stupid, misogynistic idiots who think only with their dicks. Then there was the Brendan Eich post. Which wasn’t really about Brendan Eich so much as about whether it’s legitimate to call Eich out and the tired old trope about how the oppressed become the oppressors by, err, talking about homophobia on Twitter. Then to round it off, a post on the reaction to Ryan Funduk’s post about drinking in tech culture.
I wanted to share a little reading list: Derailing for Fun and Profit by Peter Aronoff, which deals with the ever so tiresome response “oh, but they have a right to their opinion”. Which is really a big old red herring. Sure, people have the right to an opinion. You have the right in the strict legal sense to believe that Queen Elizabeth II is actually a shape-shifting lizard from the Draco constellation. Only I also have the right to consider that absolutely fucking crazy and to think that you are off your rocker.
Chris Heilmann has written a post discussing whether Twitter is a good place to have these kinds of arguments, and includes the excellent TEDx video from Jay Smooth on racism.
Chris is obviously right about the potential for grandstanding and sloganeering on platforms like Twitter. But, I think he goes too far. DISREGARD THAT, I’m an idiot. Chris wasn’t saying what I think he was saying. Apologies.
I think that with a few obvious limits,1 honesty is a better policy than hushing things up in order to give outsiders the view that the tech community is free of disagreement. However painful talking about things like sexism and homophobia and the social issues around geek culture can be, it’s better to have the conversation than keep quiet.
Finally, read Natalie Reed’s Hipster Misogyny. Because the reaction to the Boston API Hackathon thing was so clearly hipster misogyny. I didn’t cover it in my post as that wasn’t what the post was about. Remember what they said in their non-apology apology? “While we thought this was a fun, harmless comment poking fun at the fact that hack-a-thons are typically male-dominated, others were offended.”
Yeah, LOL GUYZ WHY SO SERIOUS ON THE SEXISM SHIT? That about sums it up.
-
“We must respect the other fellow’s religion, but only in the sense and to the extent that we respect his theory that his wife is beautiful and his children smart.” —H. L. Mencken. ↩
Amazon ElastiCache makes it easy for you to deploy, scale, and run a cloud-based in-memory cache that is protocol-compliant with Memcached. ElastiCache improves the performance of web applications and reduces the load on your databases by retrieving data from a fast, managed, Memcached-compatible, in-memory caching system, instead of relying entirely on disk-based storage. It can significantly improve throughput for read-heavy or compute-intensive workloads including Social Networking, Mobile and Social Gaming, E-Commerce Sites, Media Sites, and Recommendation Engines.
In order to make ElastiCache an even better value, we are adding a full suite of Reserved Cache Node options -- Light, Medium, and Heavy with both 1 and 3 year terms. See the ElastiCache pricing for additional information. Reserved Cache Nodes can provide savings of up to 70% compared to On-Demand pricing. More information, including pricing, is available on our new Reserved Cache Nodespage.
You can easily migrate from Memcached to ElastiCache using our How Do I Migrate FAQ as a guide; you can also use the ElastiCache CloudFormation template to launch a cache cluster.
Finally, you'll also find useful information in the recorded version of our "Turbo-charge Your Apps Using Amazon ElastiCache" webinar:
-- Jeff;
From tax preparation to safe social networks, Amazon RDS brings new and innovative applications to the cloud
Empowering innovation is at the heart of everything we do at Amazon Web Services (AWS). I often get to meet, discuss, and learn from innovators how they are using AWS to deliver transformative applications to their users, customers and partners. Often we think about innovation as doing 'new things' or based on revolutionary new technologies such as DynamoDB, but it is more important to ensure that one can also innovate based on existing paradigms. One of the services that is very successful in driving innovation at our customers in this context is Amazon RDS, the Relational Database Service. Amazon RDS removes the headaches of running a relational database service reliably at scale, allowing Amazon RDS customers to focus on innovation for their customers.
Recently I had great conversations with Troy Otillio, Senior Development Manager at Intuit and Jack Murgia, Senior DevOps Engineer at Edmodo. Troy and his team have added a contextual social offering to the popular TurboTax and Intuit applications. Jack and his engineers have created a safe social app for teachers and students. These innovators use Amazon RDS in conjunction with other Amazon Web Services to build, scale and operate their applications. Below is my dialog with them. Read on.
Note: If you want to see how Amazon RDS can enable your creative agenda, sign up for the 60 day free trial.
Troy, Jack, Tell me a little bit about your app. What's unique and innovative about your service?
Troy: Live Community Platform is Intuit's flagship Contextual Social offering – Live Community makes it easy to find answers when and where you need them. This is a unique and innovative platform.
- Intelligent Social network - Facilitate topical Q&A conversations among employees, customers and our most valued super contributors.
- Large Seasonal Peaks – Our largest community supports TurboTax where the peak traffic during February or April is often 100's of times greater than a quiet day in June. Live Community Experience is deeply integrated into the tax experience, so we built a highly responsive and reliable web experience.
- Read-your-mind contextual integration – Our core innovation and underlying secret sauce involves selecting the most relevant content for a given page if not given user – to provide the right answer at the right time to our users.
Jack: Edmodo is the safe social network for education used by a network of over 6 million teachers and students worldwide that allows teachers to create and maintain their classroom communities. Some unique and innovative characteristics of Edmodo are:
- Edmodo is as easy to use as other social network sites, but secure - the teacher has the same control over access, content and behavior in Edmodo as he/she does in the classroom.
- Students gain experience they need for the modern workplace, learning how to work responsibly and effectively in a collaborative, project-based manner on a social website.
- Teachers can use Edmodo to share educational content, manage projects and assignments, handle notifications, and conduct quizzes and events.
- Teachers can interact with their colleagues in professional learning networks.
- Schools and districts can claim unique Edmodo web addresses for added communication and customization.
How are your users adopting and responding to your service?
Troy: We moved our service from internal servers to AWS. Our 25+ million strong TurboTax and Intuit user community grows every year and Live Community is an integral component of the overall product experience. Moving to AWS has enabled us with operational agility to deliver more value to those customers without having to worry about scale and infrastructure maintenance. We now have more time to focus on innovation while being confident that when demand increases we can easily add more capacity.
Jack:Since our launch in late 2008, we've grown to over 6 million teachers and students globally primarily through word of mouth of teachers who have shared Edmodo with each other. In addition to using Edmodo to engage students in classroom activities, teachers all over the world build profile pages on Edmodo, which they use to discover and share content, meet and stay in contact with other educators, and best practices and top resources.
Jack, how did this idea come about? How did you choose a SQL approach to solve this problem?
Jack: After years of seeing teachers struggle to share the web with their classroom, Edmodo founders Nic Borg and Jeff O'Hara knew there was a need for a highly scalable, secure social network targeted at K-12. SQL was the right choice because it was an established and proven technology for use in similar environments, and the massive knowledge base that exists around it.
And how about you Troy? Why did you choose a SQL approach to build your social community app?
Troy: The initial architecture was based on MySQL– we've continued with use of SQL but are now leveraging RDS. Of course, with as much textual data as we have we are leveraging Lucene/SOLR (a NoSQL solution) for Search and Semantic processing. More recently we've expanded our platform to include additional forms of user interaction observation in support of our real-time analytics – here we've begun to leverage NoSQL technologies like Redis. Going forward we'll continue to employ a hybrid approach using RDS for the necessary transactional computation and services like DynamoDB for high performance and scalability for structured data.
What did you find unique about RDS? What has been your experience so far?
Troy: We love RDS – it's reduced our operational workload by a noticeable factor but even more exciting is the benefits around fast recovery enabled by the Multi-Availability Zone capability. My team often brags about the one-click creation of read replicas, ability to upsize or downsize the database without downtime and automatic back-up. However, the shining moment occurred just last month – during peak load there was a hardware failure on the Server powering a RDS Master Database – RDS automatically failed over to the alternate zone within minutes and our customers experience was fully functional shortly thereafter. The best part was that the entire process was what I call Òhands freeÓ and took near zero development effort. With self-hosted databases we would have invested considerable engineering effort to implement, test and retest failover – to achieve fast recovery with RDS we simply changed our configuration. And when the actual production event took place the recovery required no manual intervention – the response from our CTO after hearing what happened: "that's cool".
We encountered a few situations that required help from the Amazon team – for example, we didn't know that the I/O capacity of the Server is governed by the size of storage and size of the server. When we first attempted to load our production database it took 28 hours – after a few days of attempts to reduce the load time through well-known optimizations (mostly documented on the RDS website) we were stuck at 8 hours. We consulted directly with Amazon and learned that the storage and the DB Server size affected I/O throughput – after altering our size we dropped our load time to 1 hour which was within expectations relative to native database.
Jack: Based on our experience during this period of phenomenal growth while our team productivity is stretched to the max, we see that:
- RDS is a huge time-saver
- RDS provides peace of mind about our data
Anything that saves time and simplifies processes for employees of a young startup has a positive affect that CAN NOT be overstated. The peace of mind part needs no explanation. Nobody on our team regrets moving to RDS MySQL - quite the opposite; we all agree we don't want to think about where we would have been without RDS. We have been able to meet our goal of architecting our application for 0% "maintenance downtime"
Out of the box, RDS' CloudWatch data and graphs speed up the troubleshooting process.
- Complete certainty in a DB environment is VERY unique- we never worry that:
- our DB parameters are identical across replicas, and changes propagate at a time of my choosing
- the recoverability of data
This is great to hear. We are glad we RDS meets your needs. Now, what's next on your innovation agenda?
Jack: We want to deliver an even more performance and rock solid experience for our global user base of teachers, students, administrators and parents. We will be:
- Building "incident managers" which utilize the cloud watch data and AWS APIs to automatically replace servers and/or re-deploy when problems arise.
- Building "incident creators" - servers which test our ability to maintain peak performance.
We believe that by leveraging the services that Amazon provides to the fullest we can continue to scale our exceptional user experience so that Edmodo can be the platform for classrooms around the world on devices of all shapes and sizes.
Troy: At Intuit, we can go further to leverage the benefits of elasticity and further improve our resiliency. We are investing in use of CloudFormation coupled with Chef – the result will enable us to lower costs and further reduce risk.
Prior to AWS we had several tiers that we now think can be delegated to AWS services – this should free up our team to focus on our domain problems. For example, we are intending to replace our EC2/Memcache tier with ElastiCache, our batch processing with Simple Workflow and Web servers with CloudFront.
With our newfound agility we can launch new services quickly and there are a few on our plate in the near term. In some cases we are refactoring our system into smaller, discrete services, while in other cases we are creating wholesale new services Our core problem domain consists of extracting greater value out of textual and behavioral data which means that use of EMR and even the newly released Workflow should enable us to focus more on the domain and less on the system engineering.
Troy, Jack, Thank you both very much for sharing your unique experience. I look forward to hearing your progress.
Jack: Thank you. This has been a great dialog.
Troy: Thank you Werner. We appreciate the opportunity.
As I noted before, it's a great pleasure to talk to these innovators and how AWS helps their journey. If you have never used RDS before, you can sign up for a 60 day free trial What innovation will you bring to market? How will it change the world? We won't know until you try and build something.
We hung out with the DreamHost team for the first time at HostingCon in August 2011. They threw a great party, had awesome t-shirts, and exuded the kind of excitement and passion for hosting that CloudFlare has for making websites faster and more secure. We immediately knew we wanted to partner with them.
Fast forward nine months to today and we are happy to announce that DreamHost is now an official Certified Hosting Provider. Beginning today they're offering CloudFlare to all their customers with a one-click-simple integration. Prior to this partnership, we had thousands of DreamHost customers who signed up for CloudFlare directly through our site. Now, every DreamHost customer has simple, easy access to CloudFlare with a click of a button and without having to mess with their DNS. Other bells and whistles like mod_cloudflare are now included in all the default DreamHost configurations, so even existing CloudFlare users on the DreamHost network will benefit.
CloudFlare Plus
DreamHost is the latest CloudFlare Certified Hosting Partner, a program that makes CloudFlare one-click simple for any hosts to provide to their customers. We're trying a new experiment and allowingDreamHost to offer a special plan they've dubbed CloudFlare Plus. We worked with them to create this custom CloudFlare plan with the features they thought would be the most interesting for their customers and a price point lower than our current Pro product. For those folks for whom CloudFlare Pro is a bit more than they need, DreamHost now offers another option with some of our most popular paid features.
Just Sayin': CloudFlare ≈ Voltron
I do have to say that DreamHost will also always hold a special place in my heart for what must be the most fun press release I've ever seen, a full copy of which is below. If you only read one part, read the quote near the end that I bolded which is certifiably awesome.
FOR IMMEDIATE RELEASE
DreamHost Partners With CloudFlare
CloudFlare to Provide Free Site Performance Optimization and Security Services for all DreamHost customers
LOS ANGELES, California—April 5th, 2012—DreamHost, a global full-service web hosting company, has today announced a partnership with leading Internet web performance and security company, CloudFlare. DreamHost customers now have immediate access to CloudFlare's robust infrastructure at little or no cost as a standard feature of their hosting plans.
Shared web hosting customers, for many years, have rarely been able to take advantage of the benefits provided by Content Delivery Networks as a result of either the cost or complexity involved. CloudFlare has removed both barriers to entry by distilling the entire setup and configuration process down to a single checkbox and removing the cost entirely. CloudFlare brings the performance and security tools previously available only to Internet giants to anyone with a website.
Hundreds of nodes around the globe power CloudFlare's network ensuring that websites load quickly and consistently, regardless of where in the world users happen to be. CloudFlare's Anycast technology works with static and dynamic sites, routing users to the node on their network for the fastest performance — all without breaking a sweat!
CloudFlare's “Always Online” technology ensures that sites taking advantage of the CloudFlare platform will remain online, continuing to serve cached content, even if the hosting servers on which they are housed become temporarily unreachable. If the hosting industry had a holy grail, it might look a little something like CloudFlare.
In addition to CloudFlare's free offering, DreamHost has also worked with the CloudFlare team to create “CloudFlare Plus,” a bundling of CloudFlare's most popular features available exclusively to DreamHost customers. CloudFlare Plus is an optional paid upgrade, weighing in at $9.95 per month, and adds automatic image optimization and support for Secure Socket Layer (SSL) connections. Provisioning of either option has been integrated within the DreamHost customer control panel.
“When we first met with the CloudFlare team at HostingCon 2011 we had no idea if what they were telling us was true,” said Kathy Brahm, DreamHost's Vice President of Customer Experience and Partnerships. “You know how typical sales people can be — schmoozy, smiley, 'Let-me-buy-you-dinnery', handsy … all the while making outrageous claims about their product. CloudFlare's team has been the complete opposite of those — and a true pleasure to deal with. We've spent the past few weeks putting CloudFlare through its paces and running some tests of our own. Some of us nearly fainted when the first speed tests came back. One guy cried. Me? I buried my feelings deep inside so I don't have to deal with them. It's just what I do.”
“From days of long ago, from uncharted regions of the web, comes a legend; the legend of CloudFlare, Defender of the Interwebs: a mighty robot, loved by good, feared by evil,” explains Matthew Prince, co-founder and CEO of CloudFlare, doing his best Peter Cullen impersonation. “As CloudFlare's legend grew, peace settled across the network. From Los Angeles, an Interweb Alliance was led by DreamHost. Together with other good hosts of the network, DreamHost helped maintain peace throughout the Interwebs, until a new horrible menace threatened. A closer relationship with CloudFlare was needed. This is the story of the super force of web explorers, specially trained by DreamHost, to more tightly integrate CloudFlare, Defender of the Interwebs!”
CloudFlare's free offering and the DreamHost-exclusive “CloudFlare Plus” are now available to all DreamHost customers.
I've been following the products of WonderNetwork for a while as they do some interesting stuff with servers around the world. I particularly like Wonder VPN as a drop dead simple and reliable VPN is very handy for any mobile user who wants some security when using a wireless network in Starbucks!
Recently they have been working on a new product called Natural Load Testing which is intended to make load testing your web application very simple. It's a very friendly service which is immediately apparent on the home page where it says "not logged in, no greeting. how sad". It's a little touch that appeals to me.
Natural Load Testing is currently in private beta and the WonderNetwork guys invited me into the beta and have been generous enough to spend a small part of their marketing budget to enable me to be able to publish this article. They've even managed to cope with my list of complaints and still talk to me! In this tutorial I will walk through how to use the Natural Load Testing product (as is today) to get some results. Interpreting the results and improving your website and server infrastructure is your job though :)
Getting Started
Natural Load Testing (shortened to NLT in this article!) consists of 4 main areas where you interact with it:
- Creating tests
- Configuring test suites
- Running test suites
- Reviewing results
Upon log in, the home page consists of a set of friendly buttons:

We start at the beginning: creating tests.
Creating tests
When load testing you need to configure your testing tool with the URLs of the pages and resources the you want to be tested. Natural Load Testing makes this job trivial by cleverly leveraging their proxy technology. By configuring your browser to use the NLT proxy, it will record every page you visit and provide you with a list of URLs which you can then use to create tests.
Obviously, as load testing involves hitting the server with a lot of requests, by the time NLT is out of beta, you will need to authorise each domain that you want to use. This is done in the "Manage Authorised Domains" section.
There is a helpful page on the NLT site that explains how to do this. Don't forget to turn off the proxy after recording. The WonderNetwork people have also remembered that you may want to record HTTPS traffic for load testing, and have provided a root certificate that you'll need to install. Again, there's help available on how to install this.
Once you have recorded some URLs, you can then click on the "Create" button and you are presented with a list of the most recent URLs you have recorded.

To create the test, you select the URLs that you require to be included (the checkbox in the header selects all within the group, which is very handy!). You then choose a name that will help you remember what this test does and create the test.
In order to create some interesting load tests on our site, I created a good few tests which I could then group into test suites.
Configuring test suites
Your tests are grouped into suites that are then run either sequentially or as a set of random loads. To create a suite, you click on the "Configure" button which then shows you the list of the suites that you have already created.

In this screenshot you can see two test suites that I've created. The second test "ZF2 Tutorial only" is the simplest suite possible as it contains a single page load within a single step. The top suite shows a more complex suite which hints at the complexity of the suites that you can create.
A suite is composed of one or more steps which are run in sequence. Each step can have multiple tests. If there are multiple test in the same step, then NLT will randomly pick one test for each run.
In this suite I have four tests in step 1 which are a set of pages within the BRI website. Step 2 is the contact page and so this test suite is measuring visiting one of 4 pages within the site and then choosing to go to the contact page.
Creation of a new test suite is done via drag and drop:

You simply drag your green tests over to the list of steps on the right hand side. It's all quite easy. Rather weirdly, you can't reuse the same test in multiple steps, so if you're testing a cycle, then you need to duplicate the same test so that you can place it into two different steps. I haven't found a way to edit the steps within a test suite once created either, so make sure you get it right!
Once you have added tests to steps, you can then tell NLT to send data whilst load testing. This is useful for filling in forms or logging into the website, for instance. I haven't used this section as I haven't yet tested any pages with forms or requiring login.
Rather useful fully, you can change the domain that a test suite uses. This enables you to duplicate a test suite and then change the domain to test domain that's running on another server or using a different configuration which makes side-by-side testing a little easier.
Calibration
Once we have created a test suite, the next step is to calibrate it. NLT will not allow you to run a test suite before calibration, so don't forget this bit! Calibration is done from the test suites list page where there is a drop down box of operations you can do on each suite:

Simply pick Calibrate and press Go. NLT will then perform a single run over your suite and present the standard results page. Ideally, I would like to see this page look different from the standard results page as it's a calibration run, not a standard test and so I was slightly confused at this point. Also, whilst the test is running, you simply see one of those "spinning gifs" to let you know that something is happening. I ran into a bug in this section which resulted in the calibration run failing, but I was never notified on the page that an error had occurred.
The calibration results on my ZF2 tutorial page suite look like this:

The intent of calibration is to provide a base-line to the NLT of the performance of the suite when there is no load. i.e. the idea is that this single run has provided an indication of the optimal conditions and we are expecting to be within 15% of this performance when testing under load.
Running test suites
Now that we have calibrated test suites, we can run some load tests and see what happens. In NLT parlance, we run test suites by pressing the Play button at which point the terminology changes to from play to execute. Simply select your test suite and press the Execute button. You are then presented with a form in order to set the run parameters:

There are three parameters you need to set:
| Concurrent Users | How many users will hit your test suite. For steps with multiple tests, each user will randomly pick one to run. |
|---|---|
| Total Executions | How many times the suite will be run |
| Spin up Delay | Number of milliseconds before introducing the next user. A delay of 500 will results in 2 users per second being introduced to the load up to the total Concurrent users. |
We set these numbers up to create the required load testing profile. I've been using 100 users at 250ms spin-up for 2000 total executions as this takes around 30 to 60 seconds to complete a run and seems a reasonable profile for my blog given its traffic levels.
Upon pressing execute, NLT will spin up a number of worker processes on its servers and then display a graph that updates every few seconds showing you what happening:

As you can see in the screenshot above, you get this information:
- A bar showing number of active requests (blue) on top of number of requests initiated.
- A line chart of median response time.
The x-axis is in seconds. Hence the red bar chart shows the number of requests initiated during each second and the blue bar shows the number of requests that were still active at the end of the second. Hence a request that completed within a second is counted in the red bar, but not in the blue. Ideally, therefore we would like to see smaller blue bars than red bars and we definitely don't want to see blue bars getting bigger.
The green line shows us the average time it takes to server all requests in that second. Ideally, we want this number to be lower. Clearly, if there are any errors serving a request (e.g. an nginx bad gateway error), the rather fast failure time is not counted as it would bring the average down!
We can also view the data as a table:

Obviously, this is right at the start of the run, and so all look good. Clicking on a given run id will provide us with more information. This is a run that's later in the test:

As you can see, the server was struggling now! The red background means that the request took more than 15% longer than the calibration run and the difference from calibration is shown in brackets in the Response Time column. In this particular run, the main HTML took 760ms to deliver which as 411ms longer than the calibration run. We're also struggling to serve images in a sensible time. This is clearly not ideal!
Looking at what's going on on the server whilst running a load test is also instructive. A good introduction to the server tools that are useful is 16 Linux server monitoring commands you really need to know by Steven Vaughan-Nichols.
Reviewing results
Once you have a few runs under your belt, the Review page becomes useful. In this page we can see a list of all our previous runs and most importantly, we can edit the title of each run and give it a useful name. It would be nice to be able to store additional notes about each run though.

Adding the title makes it much easier to remember why a given run was performed and I've found it useful.
Using Natural Load Testing to improve performance
The obvious first target for testing NLT, was my blog at akrabat.com. This is a simple WordPress blog which recently moved to a new server. I haven't been particularly worried about performance and don't really expect to ever be slash-dotted (or is it Cal-dotted, nowadays!) any time soon.
However, as I've done nothing to tune the system, this is a good time to see how it behaved. The first run produced these results:

This isn't good! The only good sign was that none of the 2000 runs actually resulted in an error. A median response time around 1.5 seconds didn't sound good and the maximum run time was 13 seconds!
Something had to be done!
A little bit of investigating showed that neither APC or WP Super Cache was installed on the server. So I turned them on.
The effect of APC can be seen here:

We now have an average response time of much less than 500ms which seems much saner. There were a number of very long requests of over 1 second though.
To see if I could improve this further, I then installed and enabled WP Super Cache in PHP mode and re-ran Natural Load Testing. The results were:

Note that you need to be careful when comparing graphs as the axes are automatically scaled. Looking at the results with WP Super Cache was enabled, I can see that the response time display of the graph is much smoother. The tabular data backs this up as it shows that the vast majority of requests were taking less than 200ms with significantly fewer runs taking much longer. Also, the curve of the number of requests initiated vs request active at the end of each second is much smoother, which indicates that with WP Super Cache enabled, the server should be able to hold its own over a longer time period much more easily.
I tested this hypothesis by doing 5000 runs rather than the 2000 shown above. The result was:

Again, the median response time is nicely under 150ms for most of the time, but we have more variance in the response time, with a couple of seconds where the median was significantly over 250ms.
On the whole, my investigations with NLT have made my blog much more responsive and more likely to be able to handle more traffic than before I started this process.
Other thoughts
Having walked though what Natural Load Testing does, I also need to point out that when using it, you can tell that this is a product still in beta! Incidentally, the WonderNetwork people have responded to my reports quickly, even if half the time it was to let me know that my idea or complaint was already on their list!
Nearly all the issues I have with NLT at the moment are related to usability, which I'm sure will be ironed out over time. My "favourite" annoyance as noted above is on the create test suite page where the Save button is above the section where you set up the suite's steps. If you accidentally press Save after entering a title, then you find you can't actually add any tests to the steps! There's also no menu bar or easy way to get from one section to another. You very quickly learn to click on the NLT logo at the top which takes you to the home page.
It would also be good if NLT would allow me to store "execution profiles" so that I don't have to keep typing in that I want 100 users for 2000 total runs.
I would also like to see more aggregate results. In particular, I'd like to have stats on the response time for say the 50th and 95th percentile. I'd also like to graph these against multiple runs along with the run title so I can see how the site's performance changed over time. This then leads to the idea that it would be nice if I could schedule a load test once a month at 3am local time!
Conclusion
Natural Load Testing is the first product I've used that makes me actually want to do load testing. It makes it easy to run a specific load test and repeat exactly the same test as frequently as you need to. Tweaking the server to see the effect that any given change has becomes an interesting task and your websites can only benefit as you reduce and remove the bottle necks that you find.
On the whole, I'm quite impressed with Natural Load Testing and can see that it could turn into a very useful tool in the web developer's toolbox.
As I mentioned at the top, NLT is currently in private beta. If you want to get an invite, head over to the sign up form and fill in your details. You also have to read the thank you for signing up message!
This post tells a story.
A long time ago, I set out to write my own blog platform. Yes, WordPress is a fine blogging platform, as is Serendipity (aka "s9y", and my previous platform). And yes, I know about Habari. And, for those of you skimming ahead, yes, I'm quite aware of Jekyll, thank you anyways.
Why write something of my own? Well, of course, there's the fact that I'm a developer, and have control issues. Then there's also the fact that a blog is both a simple enough domain to allow easily experimenting with new technology and paradigms, while simultaneously providing a complex enough domain to expose non-trivial issues.
When I started this project, it was a technology-centered endeavor; I wanted to play with document databases such as CouchDB and MongoDB, and with caching technologies like memcached and redis.
Not long after I started, I also realized it was a great playground for me to prototype ideas for ZF2; in fact, the original DI and MVC prototypes lived as branches of my blog. (My repository is still named "zf2sandbox" to this day, though it technically houses just my site.)
Over time, I had a few realizations. First, my actual blog was suffering. I wasn't taking the time to perform security updates, nor even normal upgrades, and was so far behind as to make the process non-trivial, particularly as I had a custom theme, and because I was proxying to my blog via a ZF app in order to facilitate a cohesive site look-and-feel. I needed to either sink time into upgrading, or finish my blog.
My second realization, however, was the more important one: I wanted a platform where I could write how I want to write. I am a keyboard-centric developer and computer user, and while I love the web, I hate typing in its forms. Additionally, my posts often take longer than a typical browser session -- which leaves me either losing my work in a GUI admin, or having to write first in my editor of choice, and then cut-and-paste it to the web forms. Finally, I want versions I can easily browse with standard diffing tools.
When it came down to it, my blog content is basically static. Occasionally, I'll update a post, but it's rare. Comments are really the only dynamic aspect of the blog... and what I had with s9y was not cutting it, as I was getting more spam than I could keep up with. New commenting platforms such as Livefyre and Disqus provide more features than most blogging platforms I know, and provide another side benefit: because they are javascript-based, you can simply drop in a small amount of markup into your post once -- meaning your pages can be fully static!
Add these thoughts to the rise of static blogging platforms such as the aforementioned Jekyll, and I had a kernel of an idea: take the work I'd done already, and create a static blog generator.
Unlike Zend Framework 1, the view layer in Zend Framework 2 separates the variables assigned to each view model. This means that when you are in the layout view script, you don't automatically have access to variables that were assigned the the action's view model and vice versa.
Accessing action variables in the layout
Consider this controller code:
class IndexController extends ActionController { public function indexAction() { return array('myvar' => 'test'); } }
If you are in the layout.phtml, then to retrieve this value you do:
layout.phtml:
<?php $children = $this->viewModel()->getCurrent()->getChildren(); $child = $children[0]; ?> <!-- some HTML --> <?php echo $this->escape($child->myvar);?>
If you really want to make sure you collect the correct child view model, then you could iterate over $children and look for the child that has the correct captureTo name set. For the action's view model, this defaults to content:
layout.phtml:
<?php $children = $this->viewModel()->getCurrent()->getChildren(); foreach($children as $child) { if ($child->captureTo() == 'content') { break; } } ?> <!-- some HTML --> <?php echo $this->escape($child->myvar);?>
Accessing layout variables in the action view
If you have assigned a variable to the layout's view model in, say, an event listener within Module.php:
Module.php:
public function onBootstrap($e) { $application = $e->getParam('application'); $viewModel = $application->getMvcEvent()->getViewModel(); $viewModel->some_config_var = '12345'; }
This is how you access some_config_var in the action view:
view/index/index.html:
<?php echo $this->escape($this->layout()->some_config_var); ?>
Another, more long winded way is to use the getRoot() method on the viewModel view helper:
view/index/index.html:
<?php $layoutViewModel = $this->viewModel()->getRoot(); ?> <!-- Some HTML --> <?php echo $this->escape($layoutViewModel->some_config_var); ?>
Setting configuration variables into the view
It therefore follows that if you need to set a variable that could be accessed from any view script, it's easiest to set it into the layout's view model and then access it via the layout() view script. This is handy for view layer config variables that you want to store in your config files, such as the Google search API key.
Application/config/module.config.php:
<?php return array( 'layout' => array( 'google_search_api_key' => '1234567890', ),
Application/Module.php:
public function onBootstrap($e) { $application = $e->getParam('application'); $config = $e->getParam('config'); $viewModel = $application->getMvcEvent()->getViewModel(); $viewModel->config = $config->layout; }
view/search/index.html:
<?php echo $this->layout()->config->google_search_api_key; ?>
In Making a mobile connection I describe how after just a few seconds of inactivity your mobile phone demotes the radio link to your carrier network. It typically takes 1-2 seconds to re-establish the radio link to full bandwidth capacity. This is a huge delay!
A few days ago I was discussing desktop vs. mobile page load times with some web performance wonks. These times were gathered from real users via the W3C Nav Timing API. We started chatting about why the mobile times were worse – slower connection speeds, less cache space, etc. – and it hit me that taking 2 seconds to re-establish the radio link might account for much of what makes mobile sites slower, especially in RUM (Real User Monitoring) vs. synthetic testing. And I wondered:
After some initial testing it looks like the answers are:
I started by creating a Nav Timing test page that shows the values from Nav Timing. If you load the page you’ll see something like this. (Please look at page source to see how I calculate these conceptual time values.)
total time = 239 ms dns = 119 ms connect = 16 ms ttfb = 61 ms HTML = 0 ms frontend = 42 ms
NOTE: Nav timing is available in Android 4. I’m not aware of any other mobile platform that has it, so you’ll need an Android 4 device to run these tests. You should close all/most currently running apps on your mobile device as they might be keeping the radio link alive in the background. On Android 4 this is done under Settings | Apps | Running. I had to stop Google Services.
You can determine if the radio link promotion delay occurs based on whether any of the times are greater than 2 seconds. Here’s a key:
- no 2 second times
- If all of the times are less than 2 seconds then the radio link was already active. You can create this result by loading the page multiple times in quick succession. All the times should be pretty fast because you have a radio link, the DNS resolution is cached, and you have a persistent connection to the web server.
- dns > 2 seconds
- If you wait 10-20 seconds (and closed all background apps) the radio link gets demoted. At this point clicking on one of the buttons to open the test page on another domain will force a DNS lookup. Normally the DNS lookup should take a few hundred milliseconds, but if the radio link needs to be promoted the DNS time jumps to 2000+ milliseconds. This page is hosted on three different domains. If you use all three pages thus caching all three DNS resolutions, the only way I know of to clear the DNS cache is to power cycle the phone.
- connect > 2 seconds
- If you allow the radio link to be demoted by waiting 10-20 seconds and reload the page (or click the button for the same page) you might see the connect time is greater than 2 seconds. This happens when the DNS is cached but there’s no persistent connection to the server. This is harder to reproduce – it depends on the browser’s policy for closing persistent connections.
- ttfb > 2 seconds
- If the radio link is demoted, the DNS is cached, and there’s a persistent connection to the server you’ll see the time-to-first-byte (ttfb) is greater than 2 seconds. This is what happens most frequently when you load the same page multiple times with a 10-20 second gap in-between.
It’s important that developers focusing on performance be aware of the impact of radio link promotion on nav timing for mobile traffic so you don’t waste time solving the wrong problem: If you’re gathering RUM data via nav timing and see slow DNS times, you might think about investing in your DNS infrastructure – even though those slow DNS times might be caused by radio link promotion. Similarly, if you see long connection times it might not make sense to investigate how your servers manage persistent connections. And slow time-to-first-byte values may or may not indicate a backend app layer performance problem.
My website doesn’t generate enough mobile traffic to verify this theory, but I believe that websites with enough mobile nav timing data will see bimodal distributions of their timing data for dns, connection, and ttfb where the modes are ~2 seconds apart. If anyone has enough data (you know who you are) please take a look and comment below. It might be possible to develop heuristics that help us determine when radio link delays are having an impact. I’d love to get some stats on the percentage of page views that incur this delay.
Following yesterday's article on returning JSON from a ZF2 controller action, Lukas suggested that I should also demonstrate how to use the Accept header to get JSON. So this is how you do it!
Set up the JsonStrategy
We set up the JsonStrategy as we did in returning JSON from a ZF2 controller action.
Return a ViewModel from the controller
As we're letting the JsonStrategy intercede for us, we don't need to do anything special in our controller at all. In this case, we simply return a normal ViewModel for use by either the JsonRenderer or PhpRenderer as required:
module/Application/src/Application/Controller/IndexController.php:
<?php namespace ApplicationController; use ZendMvcControllerActionController, ZendViewModelViewModel; class IndexController extends ActionController { public function anotherAction() { $matches[] = array('distance' => 10, 'playground' => array('a'=>1)); $matches[] = array('distance' => 20, 'playground' => array('a'=>2)); $matches[] = array('distance' => 30, 'playground' => array('a'=>3)); $result = new ViewModel(array( 'success'=>true, 'results' => $matches, )); return $result; } }
with our HTML view script:
module/Application/view/index/another.phtml:
<?php if ($success): ?> <h2>Results</h2> <ul> <?php foreach ($results as $row): ?> <li>Distance: <?php echo $this->escape($row['distance']);?>m</li> <?php endforeach; ?> </ul> <?php endif; ?>
So if you set up a route and browse to it, you'll see a nicely rendered page.
Retrieving the data as JSON
To retrieve the data via JSON, we need a client where we can set the Accept header. We'll use curl for this test. When doing anything with APIs and testing, we head over to LornaJane's blog for the Curl Cheat Sheet and use this command line:
curl -H "Accept: application/json" http://zf2test.dev/json/another
and you should see the output of:
{
"content":{
"success":true,
"results": [
{"distance":10,"playground":{"a":1}},
{"distance":20,"playground":{"a":2}},
{"distance":30,"playground":{"a":3}}
]
}
}
(Formatted for readability - you get the result back on a single line from curl.)
This way you can use the same controllers for your HTML views and for returning JSON to those clients that can use it.
The new view layer in Zend Framework 2 can be set up to return JSON rather than rendered HTML relatively easily. There are two steps to this:
Set up the JsonStrategy
Firstly we need to set up the view's JsonStrategy to check to a situation when returning JSON is required and then to render out JSON for us. The JsonStrategy will cause the JsonRenderer to be run in two situations:
- The view model returned by the controller action is a JsonModel
- The HTTP Accept header sent in the Request include "application/json"
The enable the JsonStrategy, we simply attach it to the view's event manager with a reasonably high priority. This can be done in our Application's Module class. Firstly we create an onBootstrap() callback on the bootstrap event and then we implement onBootstrap() to attaché the JsonStrategy:
module/Application/Module.php:
class Module implements AutoloaderProvider { public function init(Manager $moduleManager) { $events = StaticEventManager::getInstance(); $events->attach('bootstrap', 'bootstrap', array($this, 'onBootstrap')); } public function onBootstrap(Event $e) { $application = $e->getParam('application'); /* @var $application ZendMvcApplication */ $locator = $application->getLocator(); $view = $locator->get('ZendViewView'); $jsonStrategy = $locator->get('ZendViewStrategyJsonStrategy'); $view->events()->attach($jsonStrategy, 100); } // more methods such as getConfig() and getAutoloaderConfig() }
As you can see, in init() we grab the StaticEventManager to attach our onBootstrap() method to the bootstrap event. Then, within onBootstrap(), we grab the view and the JsonStrategy from the locator (via application) and attach the JsonStrategy to the view's events() event manager.
Return a JsonModel from the controller action
To send JSON to the client when the Accept header isn't application/json, we use a JsonModel in a controller action like this:
module/Application/src/Application/Controller/IndexController.php:
namespace ApplicationController; use ZendMvcControllerActionController, ZendViewModelViewModel, ZendViewModelJsonModel; class IndexController extends ActionController { public function indexAction() { $result = new JsonModel(array( 'some_parameter' => 'some value', 'success'=>true, )); return $result; } }
The output will now be JSON. Obviously, if you're sending JSON back based on the Accept header, then you can return a normal ViewModel.
This past December I contributed an article called Frontend SPOF in Beijing to PerfPlanet’s Performance Calendar. I hope that everyone who reads my blog also read the Performance Calendar – it’s an amazing collection of web performance articles and gurus. But in case you don’t I’m cross-posting it here. I saw a great presentation from Pat Meenan about frontend SPOF and want to raise awareness around this issue. This post contains some good insights.
Make sure to read PerfPlanet – it’s a great aggregator of WPO blog posts.
Now – flash back to December 2011…
I’m at Velocity China in Beijing as I write this article for the Performance Calendar. Since this is my second time to Beijing I was better prepared for the challenges of being behind the Great Firewall. I knew I couldn’t access popular US websites like Google, Facebook, and Twitter, but as I did my typical surfing I was surprised at how many other websites seemed to be blocked.
Business Insider
It didn’t take me long to realize the problem was frontend SPOF – when a frontend resource (script, stylesheet, or font file) causes a page to be unusable. Some pages were completely blank, such as Business Insider:
Firebug’s Net Panel shows that anywhere.js is taking a long time to download because it’s coming from platform.twitter.com – which is blocked by the firewall. Knowing that scripts block rendering of all subsequent DOM elements, we form the hypothesis that anywhere.js is being loaded in blocking mode in the HEAD. Looking at the HTML source we see that’s exactly what is happening:
<head> ... <!-- Twitter Anywhere --> <script src="https://platform.twitter.com/anywhere.js?id=ZV0...&v=1" type="text/javascript"></script> <!-- / Twitter Anywhere --> ... </head> ... <body>
If anywhere.js had been loaded asynchronously this wouldn’t happen. Instead, since anywhere.js is loaded the old way with <SCRIPT SRC=..., it blocks all the DOM elements that follow which in this case is the entire BODY of the page. If we wait long enough the request for anywhere.js times out and the page begins to render. How long does it take for the request to timeout? Looking at the “after” screenshot of Business Insider we see it takes 1 minute and 15 seconds for the request to timeout. That’s 1 minute and 15 seconds that the user is left staring at a blank white screen waiting for the Twitter script!
CNET
CNET has a slightly different experience; the navigation header is displayed but the rest of the page is blocked from rendering:
Looking in Firebug we see that wrapper.js from cdn.eyewonder.com is “pending” – this must be another domain that’s blocked by the firewall. Based on where the rendering stops our guess is that the wrapper.js SCRIPT tag is immediately after the navigation header and is loaded in blocking mode thus preventing the rest of the page from rendering. The HTML confirms that this is indeed what’s happening:
<header> ... </header> <script src="http://cdn.eyewonder.com/100125/771933/1592365/wrapper.js"></script> <div id="rb_wrap"> <div id="rb_content"> <div id="contentMain">
O’Reilly Radar
Everyday I visit O’Reilly Radar to read Nat Torkington’s Four Short Links. Normally Nat’s is one of many stories on the Radar front page, but going there from Beijing shows a page with only one story:
At the bottom of this first story there’s supposed to be a Tweet button. This button is added by the widgets.js script fetched from platform.twitter.com which is blocked by the Great Firewall. This wouldn’t be an issue if widgets.js was fetched asynchronously, but sadly a peek at the HTML shows that’s not the case:
<a href="http://www.stevesouders.com/blog...">Comment</a> | <span class="social-counters"> <span class="retweet"> <a href="http://twitter.com/share" class="twitter-share-button" data-count="horizontal" data-url="http://radar.oreilly.com/2011/12/four-short-links-6-december-20-1.html" data-text="Four short links: 6 December 2011" data-via="radar" data-related="oreillymedia:oreilly.com">Tweet</a> <script src="http://platform.twitter.com/widgets.js" type="text/javascript"></script> </span>
The cause of frontend SPOF
One possible takeaway from these examples might be that frontend SPOF is specific to Twitter and eyewonder and a few other 3rd party widgets. Sadly, frontend SPOF can be caused by any 3rd party widget, and even from the main website’s own scripts, stylesheets, or font files.
Another possible takeaway from these examples might be to avoid 3rd party widgets that are blocked by the Great Firewall. But the Great Firewall isn’t the only cause of frontend SPOF – it just makes it easier to reproduce. Any script, stylesheet, or font file that takes a long time to return has the potential to cause frontend SPOF. This typically happens when there’s an outage or some other type of failure, such as an overloaded server where the HTTP request languishes in the server’s queue for so long the browser times out.
The true cause of frontend SPOF is loading a script, stylesheet, or font file in a blocking manner. The table in my frontend SPOF blog post shows when this happens. It’s really the website owner who controls whether or not their site is vulnerable to frontend SPOF. So what’s a website owner to do?
Avoiding frontend SPOF
The best way to avoid frontend SPOF is to load scripts asynchronously. Many popular 3rd party widgets do this by default, such as Google Analytics, Facebook, and Meebo. Twitter also has an async snippet for the Tweet button that O’Reilly Radar should use. If the widgets you use don’t offer an async version you can try Stoyan’s Social button BFFs async pattern.
Another solution is to wrap your widgets in an iframe. This isn’t always possible, but in two of the examples above the widget is eventually served in an iframe. Putting them in an iframe from the start would have avoided the frontend SPOF problems.
For the sake of brevity I’ve focused on solutions for scripts. Solutions for font files can be found in my @font-face and performance blog post. I’m not aware of much research on loading stylesheets asynchronously. Causing too many reflows and FOUC are concerns that need to be addressed.
Call to action
Business Insider, CNET, and O’Reilly Radar all have visitors from China, and yet the way their pages are constructed delivers a bad user experience where most if not all of the page is blocked for more than a minute. This isn’t a P2 frontend JavaScript issue. This is an outage. If the backend servers for these websites took 1 minute to send back a response, you can bet the DevOps teams at Business Insider, CNET, and O’Reilly wouldn’t sleep until the problem was fixed. So why is there so little concern about frontend SPOF?
Frontend SPOF doesn’t get much attention – it definitely doesn’t get the attention it deserves given how easily it can bring down a website. One reason is it’s hard to diagnose. There are a lot of monitors that will start going off if a server response time exceeds 60 seconds. And since all that activity is on the backend it’s easier to isolate the cause. Is it that pagers don’t go off when clientside page load times exceed 60 seconds? That’s hard to believe, but perhaps that’s the case.
Perhaps it’s the way page load times are tracked. If you’re looking at worldwide medians, or even averages, and China isn’t a major audience your page load time stats might not exceed alert levels when frontend SPOF happens. Or maybe page load times are mostly tracked using synthetic testing, and those user agents aren’t subjected to real world issues like the Great Firewall.
One thing website owners can do is ignore frontend SPOF until it’s triggered by some future outage. A quick calculation shows this is a scary choice. If a 3rd party widget has a 99.99% uptime and a website has five widgets that aren’t async, the probability of frontend SPOF is 0.05%. If we drop uptime to 99.9% the probability of frontend SPOF climbs to 0.5%. Five widgets might be high, but remember that “third party widget” includes ads and metrics. Also, the website’s own resources can cause frontend SPOF which brings the number even higher. The average website today contains 14 scripts any of which could cause frontend SPOF if they’re not loaded async.
Frontend SPOF is a real problem that needs more attention. Website owners should use async snippets and patterns, monitor real user page load times, and look beyond averages to 95th percentiles and standard deviations. Doing these things will mitigate the risk of subjecting users to the dreaded blank white page. A chain is only as strong as its weakest link. What’s your website’s weakest link? There’s a lot of focus on backend resiliency. I’ll wager your weakest link is on the frontend.
[Originally posted as part of PerfPlanet's Performance Calendar 2011.]
My previous blog post, Cache them if you can, suggests that current cache sizes are too small – especially on mobile.
Given this concern about cache size a relevant question is:
If a response is compressed, does the browser save it compressed or uncompressed?
Compression typically reduces responses by 70%. This means that a browser can cache 3x as many compressed responses if they’re saved in their compressed format.
Note that not all responses are compressed. Images make up the largest number of resources but shouldn’t be compressed. On the other hand, HTML documents, scripts, and stylesheets should be compressed and account for 30% of all requests. Being able to save 3x as many of these responses to cache could have a significant impact on cache hit rates.
It’s difficult and time-consuming to determine whether compressed responses are saved in compressed format. I created this Caching Gzip Test page to help determine browser behavior. It has two 200 KB scripts – one is compressed down to ~148 KB and the other is uncompressed. (Note that this file is random strings so the compression savings is only 25% as compared to the typical 70%.) After clearing the cache and loading the test page if the total cache disk size increases ~348 KB it means the browser saves compressed responses as compressed. If the total cache disk size increases ~400 KB it means compressed responses are saved uncompressed.
The challenging part of this experiment is finding where the cache is stored and measuring the response sizes. Firefox, Chrome, and Opera save responses as files and were easy to measure. For IE on Windows I wasn’t able to access the individual cache files (admin permissions?) but was able to measure the sizes based on the properties of the Temporary Internet Files folder. Safari saves all responses in Cache.db. I was able to see the incremental increase by modifying the experiment to be two pages: the compressed response and the uncompressed response. You can see the cache file locations and full details in the Caching Gzip Test Results page.
Here are the results for top desktop browsers:
| Browser | Compressed responses cached compressed? |
max cache size |
|---|---|---|
| Chrome 17 | yes | 320 MB* |
| Firefox 11 | yes | 850 MB* |
| IE 8 | no | 50 MB |
| IE 9 | no | 250 MB |
| Safari 5.1.2 | no | unknown |
| Opera 11 | yes | 20 MB |
* Chrome and Firefox cache size is a percentage of available disk space. Chrome is capped at 320 MB. I don’t know what Firefox’s cap is; on my laptop with 50 GB free the cache size is 830 MB.
We see that Chrome 17, Firefox 11, and Opera 11 store compressed responses in compressed format, while IE 8&9 and Safari 5 save them uncompressed. IE 8&9 have smaller cache sizes, so the fact that they uncompress responses before caching further reduces the number of responses that can be cached.
What’s the best choice? It’s possible that reading cached responses is faster if they’re already uncompressed. That would be a good next step to explore. I wouldn’t prejudge IE’s choice when it comes to performance on Windows. But it’s clear that saving compressed responses in compressed format increases the number of responses that can be cached, and this increases cache hit rates. What’s even clearer is that browsers don’t agree on the best answer. Should they?
Why do we have to bother about built-in GTID support in MySQL 5.6 at all? Sure, it is a tremendous step forward for a lazy primary copy system like MySQL Replication. Period. GTIDs make server-side failover easier (slides). And, load balancer, including PECL/mysqlnd_ms as an example of a driver integrated load balancer, can use them to provide session consistency. Please, see the slides. But…
… the primary remains a single point of failure. GTIDs can be described as cluster-wide transaction counters generated on the master. In case of a master failure, the slave that has replicated the highest transaction counter shall be promoted to become the master. Its the most current slave. Failover made easy - no doubt! Adequately deployed, you should reach very reasonable availability.
Know the limits of replicated systems
A multi-master (update anywhere) design does not have a single point of failure. But among the biggest is scaling a multi-master solution. Jim Gray and Pat Helland concluded 1996 in "The Dangers of Replication and a Solution": Update anywhere-anytime-anyway transactional replication has unstable behavior as the workload scales up: a ten-fold increase in nodes and traffic gives a thousand fold increase in deadlocks or reconciliations.. N^3 - buuuhhhh, anything worse than linear scale is not appreciated. Guess what: Microsoft SQL Azure is using primary copy combined with partitioning.
In practice things are not that bad, particulary not for a small number of nodes and recent algorithms. For example, MySQL Cluster (related webinar on March 29) is a true multi-master solution - even eager/synchronous. To overcome the write-scale limitations it has built-in partitioning (sharding). The two classical scale-out solutions - replication and partitioning - are combined in one product. If you want extreme performance and are ready to pay for the costs of partitioning… try it.
Anything to learn from the NoSQL kids on the block?
Some other kids offer relaxed eventual consistency just as MySQL Replication does. Sometimes the CAP theorem is cited as an excuse for it . Some leave conflict resolution, even conflict detection to the application developer . A massively scalabale, high available, synchronous update anywhere solution with built-in conflict resolution - the big thing we all dream of - is hard to create.
In the meanwhile… - maybe custer-aware APIs
While we all wait for the one-fits-all solution, there is something we can do. We can start to tell our load balancers precisely what we need and request no higher level of service than needed. Consistency - as in CAP - is one aspect of service quality. We should start to have cluster-aware APIs abstracting the details of replication architectures. Then, our load balancers, including PECL/mysqlnd_ms can hide everything that makes working with a cluster complicated (connection pooling, request splitting and redirection, failover, node selection, load distribution, …). Also, vendors can start to play with consistency to improve performance without messing up application logic.
Below is how you use the PECL/mysqlnd_ms 1.2+ function mysqlnd_ms_set_qos() to switch between eventual consistency (stale data allowed) and session concistency (read-your-writes). MySQL Replication details hidden behind a function call.
$mysqli = new mysqli("myapp", "username", "password", "database");
if (!$mysqli)
/* Of course, your error handling is nicer... */
die(sprintf("[%d] %sn", mysqli_connect_errno(), mysqli_connect_error()));
/* read-write splitting: master used */
if (!$mysqli->query("INSERT INTO orders(order_id, item) VALUES (1, 'christmas tree, 1.8m')")) {
/* Please use better error handling in your code */
die(sprintf("[%d] %sn", $mysqli->errno, $mysqli->error));
}
/* Request session consistency: read your writes */
if (!mysqlnd_ms_set_qos($mysqli, MYSQLND_MS_QOS_CONSISTENCY_SESSION))
die(sprintf("[%d] %sn", $mysqli->errno, $mysqli->error));
/* Plugin picks a node which has the changes, here: master */
if (!$res = $mysqli->query("SELECT item FROM orders WHERE order_id = 1"))
die(sprintf("[%d] %sn", $mysqli->errno, $mysqli->error));
var_dump($res->fetch_assoc());
/* Back to eventual consistency: stale data allowed */
if (!mysqlnd_ms_set_qos($mysqli, MYSQLND_MS_QOS_CONSISTENCY_EVENTUAL))
die(sprintf("[%d] %sn", $mysqli->errno, $mysqli->error));
/* Plugin picks any slave, stale data is allowed */
if (!$res = $mysqli->query("SELECT item, price FROM specials"))
die(sprintf("[%d] %sn", $mysqli->errno, $mysqli->error));
GTID for clients? Buzz alarm!
PECL/mysqlnd_ms 1.3 does not bring any ground breaking changes with regards to consistency or GTIDs. It can now either use the driver built-in GTID emulation (1.2+) or the server-side GTID feature (1.3+, MySQL 5.6) for session consistency. That’s all. I confess, the slide title is pure buzz. But in every tale is some truth.
Cluster-aware APIs and better load balancer? Follow up!
I’m convinced that good load balancers can make application developers life much easier. Read-your-writes and session consistency is an example how new API calls may come handy. Transparently replacing remote slave accesses with client-side cache accesses (coming with 1.3) is an example how load balancers can optimize overall cluster performance.
Whoever designs a replication solution in 2012 should include the load balancer into his considerations… - even for multi-master.
Happy hacking!


















