I recently was asked what maximal amount transactions per second we can get using MySQL and XtraDB / InnoDB storage engine if we have high-end server. Good questions, though not easy to answer, as it depends on:
- durability setting ( innodb_flush_log_at_trx_commit = 0 or 1 ) ?
- do we use binary logs ( I used ROW based replication for 5.1)
- do we have sync_binlog options.
So why would not take these as variable parameters and run simple benchmark.
I took sysbench update_key scenario ( update indexed field on simple table)
and used Dell PowerEdge R900 with 16 cores, FusionIO as storage for table and RAID 10 with BBU as storage for innodb log files, innodb system table space and binary logs. And I used Percon-XtraDB-5.1.43-9.1 for benchmarks. All used partitions are formatted in XFS and mounted with nobarrier option.
I run update key for various threads and with next parameters
- trx_commit=0 : innodb_flush_log_at_trx_commit = 0 and no binary logs
- trx_commit=1 : innodb_flush_log_at_trx_commit = 1 and no binary logs
- trx_commit=0 & binlog : innodb_flush_log_at_trx_commit = 0 and binary logs
- trx_commit=1 & binlog : innodb_flush_log_at_trx_commit = 1 and binary logs
- trx_commit=1 & binlog & sync_bin : innodb_flush_log_at_trx_commit = 1 and binary logs and sync_binlog=1
I found results being quite interesting.
with innodb_flush_log_at_trx_commit = 0 maximal tps is 36332.02 tps, which drops to 23115.04 tps as
we switch to innodb_flush_log_at_trx_commit = 1. As we use RAID10 with BBU, I did not expect the drops is going to be significant. In second case InnoDB spends
With enabling binary logs, the results drops to 17451.01 tps with innodb_flush_log_at_trx_commit = 0 and to 12097.39 tps with innodb_flush_log_at_trx_commit = 1. So with binary logs serialization is getting even worse.
Enabling sync_binlog makes things really bad, and maximal results I have is
3086.7 tps. So this is good decision if binary log protection is worth such drop.
UPDATE ( 3/4/2010 )
Entry posted by Vadim | 7 comments
I think that MySQL+memcached is still the default choice and I don't think it is going away in the high-scale market. But some high-scale applications either don't need all of the features of a SQL RDBMS or are willing to go without those features to scale. This isn't a blanket endorsement of NoSQL as the definition of NoSQL is weak. I am referring to the NoSQL systems that support high-scale.
I don't believe all of the bad press that MySQL receives from high-scale applications. I know that some problems with MySQL are self-inflicted (seriously, I know this). It is hard to diagnose many problems for which the primary symptom is a slow MySQL server so it is also hard to identify self-inflicted problems. I also don't think that some NoSQL systems will provide a different scale-out experience than MySQL given that some NoSQL systems scale-out by sharding (just like MySQL) and that I can deploy MySQL like NoSQL (disallow joins and secondary indexes, use HANDLER statements)
I also wonder whether affordable SSD/Flash reduces the need to migrate from MySQL to NoSQL. Many MySQL deployments that were IO bound when it was difficult to get more than a few thousand IOPs on a commodity server can now get 10,000 to 100,000 IOPs in that server at commodity prices.
MySQL and NoSQL are also at significantly different stages. MySQL is mature and maturity has its benefits. MySQL has amazing support and documentation. There are client libraries for almost every language that you should use. There are even bindings for languages you shouldn't use. The MySQL C API is easy to use. The JDBC driver is awesome, even if support for JDBC makes it much more complex than needed. There is a lot of MySQL expertise that can be hired or rented (MySQL, Monty Program, Percona, Open Query, Pythian, FromDual) and there is some innovation (not enough companies, but they are doing amazing things) from third-parties such as InfiniDB, InfoBright and TokuDB.
NoSQL systems are improving faster than MySQL, MySQL has focused on features for the enterprise RDBMS market in the past two releases and the changes we need from MySQL are hard to implement. Change is hard because MySQL is a complex server that supports many features. Change is also much harder than it should be because of the MySQL coding style. Parts of it are not modular and features are entangled. Some of the difficulty could be overcome were there interest from external contributors. There are external contributors willing and able to improve server code but they are working on other projects like NoSQL. The MySQL effort is also split (or diluted) between official MySQL, Drizzle and MariaDB.
What really happened?
I don't know. It may have been better for the business of MySQL to focus on the enterprise market. I can describe some of the problems that need to be fixed in MySQL to make things easier for me. I think other high-scale applications share these problems:
- Multi-master - high-scale applications have users around the world. Latency is reduced by distributing databases and application servers around the world. Databases are rarely sharded by location so the data store must support multi-master deployments with conflict resolution and eventual consistency for some database tables. There is no support for conflict resolution in MySQL. It might be possible to do something with the output of row-based replication.
- SQL - this is a problem that MySQL cannot fix. SQL makes it easy to make mistakes. Mistakes include insert, update and delete statements that lock all rows in a table. Alas, the EXPLAIN statement in MySQL does not support insert, update and delete statements. Another serious mistake is a query that has a lousy response time when the database buffer cache is cold because it does many random disk reads. The EXPLAIN statement in MySQL does not provide an estimate for the worst-case number of random disk IOs and many people who write SQL don't know how to interpret it to get an estimate. Worst-case performance is critical for queries run during web requests.
- Write-optimization - several NoSQL systems are write-optimized including Cassandra, HBase and Bigtable. A write-optimized system makes it possible to use more indexes than an update-in-place data store. With more indexes it is more likely that there can be an index defined for every popular query and the index reduces the number of disk reads that must be done to evaluate the query. This improves worst-case query response time and reduces the need to use memcached or a huge database buffer cache. Write-optimization has finally arrived for MySQL with the availability of TokuDB. I hope that RethinkDB provides a GA version in the future.
- Monitoring - without good monitoring you will either spend too much time fixing performance problems or never find them and buy too much hardware. I suspect that monitoring in MySQL is much better than anything in a NoSQL system but MySQL is missing features that make it easy to understand current and new sources of workload. I need to aggregate the overhead (CPU time, disk operations, rows read, ...) by database user, table and statement. It is extremely hard or not possible to do this by database user and table. It became possible in MySQL 5.1 to do this by statement for short periods of time by using the slow query log in MySQL 5.1. Prior to MySQL 5.1, the slow query log was limited to queries that ran for at least two seconds. The alternative is to use tcpdump with mk-query-digest. Despite all of the work that has gone into the performance schema, MySQL has yet to support anything like my favorite feature -- user and table monitoring.
- Crash proof slaves - replication slaves are not crash proof. The slave commits transactions to a storage engine and then updates a state file to maintain the replication offset. There is nothing to keep the state file and storage engine in sync. Until recently there wasn't even an option to force the state file to disk after it was updated. Unplanned hardware reboots are frequent when there are hundreds or thousands of slaves. If you are clever and use the right version of InnoDB it is possible in some cases to figure out the correct offset for the slave after a crash and repair it manually. This isn't a good use of DBA time. Otherwise DBAs must waste their time and network bandwidth to restore the slaves. The Google patch published two different fixes for this: rpl_transaction_enabled and global transaction IDs. MySQL is working on a fix.
- Automated failover - for MySQL deployments that have many slaves connected to one master it isn't possible to automate failover when a master crashes. Tungsten and DRBD might make this better.
- Resharding - sharding is an excellent way to scale MySQL. Sharding usually requires resharding. Resharding is hard and must be done with minimal downtime. It might be possible to build a tool that uses row-based replication output to reshard a database in the background with little downtime. No such tool exists today.
- Replication lag - a slave with replication lag is useless for OLTP scaleout. The replication thread is single-threaded. MySQL is working on support for parallel execution on a slave. Until then we need to improve mk-slave-prefetch (Domas can you hear me).
- Schema change - these are frequently needed for growing high-scale applications. Long running schema changes in MySQL require downtime unless first done on a slave (assuming you have a spare slave and that slave can become the master after the change). Users don't like downtime. I think it is possible to do many of these on a master with minimal downtime using the output from row-based replication. Alas, there is no tool for that today.
Do your homework when evaluating a NoSQL system as they differ greatly from each other:
- Crash safety - most NoSQL systems are crash safe but a few are not. I would limit the use of systems that are not crash safe to supporting batch workload. Unplanned server reboots are frequent for high-scale applications when a large number of servers is used. At least two prominent members of the NoSQL family are not crash safe. That should be documented in bold text on their project pages. It is not.
- Sharding - some NoSQL systems do sharding. BigTable and others do not. With sharding it is possible to support transactions and multiple-indexes on a table within the scope of a shard. That then requires support for resharding. It also requires that queries on secondary indexes to be run on all shards while queries on primary indexes can be limited to run on one shard which may limit the ability to use secondary indexes.
- Index types - Many NoSQL systems are limited to hash indexes. You can't do range scans on hash indexes. I wonder whether this leads to data redundancy when every query must be resolved by one index lookup.
- Secondary indexes - NoSQL systems like BigTable not only do not support transactions, they also do not support secondary indexes. You can explicitly maintain a secondary index but there is no support to make multiple-changes atomic and there can be a failure between the primary and secondary index updates which results in data drift. It is also difficult to do consistent reads between the two.
- Consistent reads - consistency is usually the responsibility of the client and done by using per-row timestamps.
- Single-node performance - I know that performance != scale-out but scale-out is not a substitute for lousy single-node performance in the high-scale application market. It might be acceptable to use 5X as many nodes because your data store is slow when you end up using 40 nodes. This becomes a show-stopper when you end up using thousands of nodes. One NoSQL system has accepted this compromise. While I know it has many other use cases I think that will limit the use of it for high-scale applications.
- Network efficiency - MySQL reduces use of the network because all query evaluation is done at the server. All NoSQL systems evaluate predicates implied by the index access. Only some NoSQL systems evaluate non-indexed predicates. This can result in more data returned to the client.
- Technology or solution - MySQL is more mature than the NoSQL systems. A lot of work remains to grow NoSQL from technology into solution with support for audit, backup, monitoring and all of the other things required to scale in a large company.
Git's branching model is one of it's best features. Branches are cheap, fast and extremely flexible. They're great for developing features, maintaining old releases, or just plain experimentation.
If you spend a lot of time with git, you'll also find that there's a lot of really useful information to be discovered in the way git can compare various branches. We're using some of this information to generate our new branch list page (you can get to this page from the 'Branch List' subnav item under Source).
Check in on your topic branches in one glance
Not only do these new branch list pages show you which branches exist on your remote, but you can see at a glance how they compare to any branch.
Each branch has what we call a divergence graph. On the left side of the black bar we show how many commits that branch is behind (commits in master not found in the branch). On the right side, we show how many commits that branch is ahead (commits found in the branch, but not in master). The colors of the bars indicate how recent the last commit was.
In that one graphic, you get an idea of when the last time each branch was updated with master, how far along that branch is, and if people have been working on it recently.
These graphs also have some implicit rules that can help when merging:
- No left side bar - This branch is safe to merge with your base branch and there will not be any conflicts.
- No right side bar - This branch has already been merged into your base branch, so it's probably ready to be deleted.
- Large left and right side bars - This branch is probably going to be difficult to merge. There's a lot of unique commits in the branch and in your base branch so the likelihood of merge conflicts is much higher.
Having fun with Rails releases
This view can also be fun to glean some information out of Rails releases. Rails keeps a branch for each point release. If we take a look at the branches with 2.2 as the base, we get a pretty interesting page
Using the divergence graphs, we can see how each of the point releases of Rails compare to the 2.2 release. You can also see that there was almost as many commits from 1.2->2.2 as there has been from 2.2->master (Rails 3 beta).
The last piece of the branch lists page is the compare button on each branch. This is an awesome feature--but I think I'll leave it to Ryan to explain in more detail.
Picking up where Kyle left off in his Branch List post, we're all very excited to announce a new feature designed to ease the process of comparing two points in a repository's history. It's called GitHub Compare View and it's going to change the way you review code.
The Compare View brings all information needed to determine what changed over a series of commits onto a single page: a condensed commit list in chronological order, followed by a rollup diff of all changes between the two points, followed by any relevant commit comments. All in the same place and with a single well-defined URL.
It's a versatile feature with many potential uses. Some of our favorites are detailed below with links to live examples of Compare View in action.
Reviewing topic branches before merging
Reviewing topic branches is a fundamental activity for anyone maintaining an active open source project or working with a team on a private project. Until now, the easiest way to do a thorough
review of a complex topic branch was to drop down to the shell and run some combination of the
git cherry, and
git diff commands. With Compare
View, we've taken that process and put it behind a single URL, so jumping into the review process is usually as simple as following a link.
Follow the examples below to see it live:
- jQuery require support branch (pictured above)
- Mustache pragma support branch
- Homebrew Pathname.rename branch
Generating comprehensive change logs for releases
While Compare View was initially designed to supplement our code review process, we quickly found that it was useful in a variety of other scenarios. Projects that are diligent about tagging can use Compare View to generate a comprehensive list of changes between any two releases.
A few examples in the wild:
- Changes between the Sinatra 0.9.4 and 1.0.a releases (pictured above)
- Changes between the Rails 3.0 beta and current master
- Changes slated for the next version of Git
How to customize the commit range
The commit range determines the starting and ending point to use in the comparison. It looks like this:
Here, we're reviewing the
proxy_owner topic branch using the
master branch as the starting point. Hitting the button switches the starting and ending points. Clicking
the starting or ending point label brings up the ref selector:
Enter any branch, tag, or commit SHA1 in the little box there and get an instant preview of the newly selected revision. Hit Save & Refresh once you're happy with the selection.
We've seen what the Compare View feels like and how to change the commit range once there, but how do you get to a Compare View in the first place?
The Branch List page. Click the Compare button next to any branch to jump into a Compare View with that branch as the ending point. See Kyle's post introducing the Branch List page for more information.
Push and Branch Create events. All push events with more than one commit now link to a Compare View over all commits included in the push. All create branch events now link to a Compare View between the repository's default branch (typically
master) and the branch head. This effects dashboards, repository timelines, and activity feeds.
Service Hooks. The IRC and Campfire service hooks now drop a Compare View link when more than one commit is included in the push. More service hooks will follow.
Compare View URLs
We wanted Compare View to be something we could link to from external sites and services anytime we were referring to a range of commits in a git repository. As such, we thought it would be worthwhile to document the basic structure of a Compare View URL:
<REPO> are obvious, and
<END> are branch names, tag names, or commit SHA1s specifying the range of history to compare. If
<START> is omitted, the
repository's default branch is assumed.
Compare View is the first of many code review related features we plan to introduce this year. We'll be incorporating Compare View into other areas of the site and developing entirely new features with Compare View as a core component.
That being said, we felt it was important that the basic task of comparing two points in a repository's history stand on its own and have a well defined URL so that the feature is useful in a broad number of circumstances. If everything goes to plan, you'll be seeing Compare View links tossed around on mailing lists, forums, IRC channels, Campfire, blog posts, release announcements, issue trackers, etc. in the very near future.