Hi Folks,
I'm pleased to announce that after some reflection, Yahoo! has decided to discontinue the "The Yahoo Distribution of Hadoop" and focus on Apache Hadoop. We plan to remove all references to a Yahoo distribution from our website (developer.yahoo.com/hadoop), close our github repo (yahoo.github.com/hadoop-common) and focus on working more closely with the Apache community. Our intent is to return to helping Apache produce binary releases of Apache Hadoop that are so bullet proof that Yahoo and other production Hadoop users can run them unpatched on their clusters.
Until Hadoop 0.20, Yahoo committers worked as release masters to produce binary Apache Hadoop releases that the entire community used on their clusters. As the community grew, we experimented with using the "Yahoo! Distribution of Hadoop" as the vehicle to share our work. Unfortunately, Apache is no longer the obvious place to go for Hadoop releases. The Yahoo! team wants to return to a world where anyone can download and directly use releases of Hadoop from Apache. We want to contribute to the stabilization and testing of those releases. We also want to share our regular program of sustaining engineering that backports minor feature enhancements into new dot releases on a regular basis, so that the world sees regular improvements coming from Apache every few months, not years.
Recently the Apache Hadoop community has been very turbulent. Over the last few months we have been developing Hadoop enhancements in our internal git repository while doing a complete review of our options. Our commitment to open sourcing our work was never in doubt, but the future of the "Yahoo Distribution of Hadoop" was far from clear. We've concluded that focusing on Apache Hadoop is the way forward. We believe that more focus on communicating our goals to the Apache Hadoop community, and more willingness to compromise on how we get to those goals, will help us get back to making Hadoop even better.
Unfortunately, we now have to sort out how to contribute several person-years worth of work to Apache to let us unwind the Yahoo! git repositories. We currently run two lines of Hadoop development, our sustaining program (hadoop-0.20-sustaining) and hadoop-future. Hadoop-0.20-sustaining is the stable version of Hadoop we currently run on Yahoo's 40,000 nodes. It contains a series of fixes and enhancements that are all backwards compatible with our "Hadoop 0.20 with security". It is our most stable and high performance release of Hadoop ever. We've expended a lot of energy finding and fixing bugs in it this year. We have initiated the process of contributing this work to Apache in the branch: hadoop/common/branches/branch-0.20-security. We've proposed calling this the 20.100 release. Once folks have had a chance to try this out and we've had a chance to respond to their feedback, we plan to create 20.100 release candidates and ask the community to vote on making them Apache releases.
Hadoop-future is our new feature branch. We are working on a set of new features for Hadoop to improve its availability, scalability and interoperability to make Hadoop more usable in mission critical deployments. You're going to see another burst of email activity from us as we work to get hadoop-future patches socialized, reviewed and checked in. These bulk checkins are exceptional. They are the result of us striving to be more transparent. Once we've merged our hadoop-future and hadoop-0.20-sustaining work back into Apache, folks can expect us to return to our regular development cadence. Looking forward, we plan to socialize our roadmaps regularly, actively synchronize our work with other active Hadoop contributors and develop our code collaboratively, directly in Apache.
In summary, our decision to discontinue the "Yahoo! Distribution of Hadoop" is a commitment to working more effectively with the Apache Hadoop community. Our goal is to make Apache Hadoop THE open source platform for big data.
Thanks,
E14
--
PS Here is a draft list of key features in hadoop-future:
-
HDFS-1052 - Federation, the ability to support much more storage per Hadoop cluster.
-
HADOOP-6728 - A the new metrics framework
-
MAPREDUCE-1220 - Optimizations for small jobs
At midnight on the morning of 1st Feb 2011 IANA announced two more /8's allocated to APNIC (39/8 and 106/8) leaving the final five /8s, one for each of the RIRs.
The policy is that these final five /8s are allocated to the five RIRs immediately, and we expect this to be formally announced 14:30 Thursday.
So, no more allocations from IANA after that!
(Well, technically, there is a pool for returned allocations, but that is going to be a tad rare)
http://www.apnic.net/publications/news/2011/leading-global-internet-groups-make-significant-announcement-about-the-status-of-the-ipv4-address-pool
Sometimes you just need some data to test and stress things. But randomly generated data is awful — it doesn’t have realistic distributions, and it isn’t easy to understand whether your results are meaningful and correct. Real or quasi-real data is best. Whether you’re looking for a couple of megabytes or many terabytes, the following sources of data might help you benchmark and test under more realistic conditions.
- The venerable sakila test database: small, fake database of movies.
- The employees test database: small, fake database of employees.
- The Wikipedia page-view statistics database: large, real website traffic data.
- The IMDB database: moderately large, real database of movies.
- The FlightStats database: flight on-time arrival data, easy to import into MySQL.
- The Bureau of Transportation Statistics: airline on-time data, downloadable in customizable ways.
- The airline on-time performance and causes of delays data from data.gov: ditto.
- The statistical review of world energy from British Petroleum: real data about our energy usage.
- The Amazon AWS Public Data Sets: a large variety of data such as the mapping of the Human Genome and the US Census data.
- The Weather Underground weather data: customize and download as CSV files.
Post your favorites in the comments!
Ideally we want to be supplying *only* IPv6 capable routers by the end of this month. A challenge I know.
Watch this space...
IPv4 finally hitting the end stops - so what next?
Well, lets look at IPv6 first. It has suffered from chicken and egg syndrome a lot for many years - with people not deploying IPv6 services as there are no users, and users not seeing any point as there are no IPv6 services...
Can that work for us now? Can the fact we now have an egg help create chickens faster? I hope so. I hope that momentum will mean IPv6 take up goes really quickly. I hope in a years time we are looking back and laughing at it all. I don't know, and I worry that IPv4 will some how cling on to the bitter end with NAT and mapping and SRV records and all sorts somehow keeping it going... Let's hope not.
We know the big stumbling block has been IPv6 capable consumer routers. At last it is happening. Give it a few months and you will not be able to buy a DSL router that does not do IPv6. And bear in mind, people do not keep this kit for years - you are lucky when things last longer than a 12 month warranty these days. So a couple of years and all end users will, by simple updates, be using an IPv6 capable router.
The ISPs are not daft either. They know they have to move now, and will. Thankfully, whilst it may be months of planning, it is not that hard. Normal maintenance and a few months work and some equipment and systems upgrades... Give it 6 months and ISPs will be IPv6 ready - they have to be.
At that point you have consumers everywhere that happen to have IPv6, without planning it or thinking about it - it will just happen naturally. People won't even realise it has happened and won't realise that www.google.co.uk is now working via IPv6. I will be surprised if this is not within 2 years.
Then, we have a case where IPv6 only services are really viable. Until then it is tricky as some people can't get to you. But there will be a tipping point when IPv6 is no problem. We saw it with browsers - there was a point when any sensible web site did not try and use javascript - now you can safely use it anywhere.
So, that is the rise of IPv6, all looking good and just a question of how long it takes.
What of IPv4? This is where I am really pondering. There are around 3.5 billion IPv4 addresses out there and they are now a limited resource. The last gold mine has been mined dry. What happens to that?
A simple and obvious thing that has to happen right away is the value of IPv4 addresses rises. Until today they were free if you needed them (excepting membership fees). Any higher value was speculative. Now you cannot get them for free any more. They have value. But what does that mean?
Everyone providing any services that consume IPv4 addresses will have to consider the price for that service. It is worth more. It has a higher price than IPv6 usage. It does not matter if broadband, hosting, virtual servers, ssl web sites, whatever. If it uses an IPv4 it is worth more. You can charge more. You need to charge more else you use up what IPv4s you have left.
So the economics change, drastically, and quickly. Simply having IPv4 space is now an asset - it is really worth wasting it on customers? We are not there yet, but how long before ISPs start using NAT even when they do not need to because they need to maximise a disposable asset? Some people will really need IP space, and some could just do with it, and some will have it. This creates a market place. Could there be IPv4 hording? What of IPv4 trading for the sake of IPv4 trading, like trading art - people not actually using the space, just holding it as an asset that will gain value? If fact, using it de-values it as it is harder to stop using it when you sell it...
When IPv4's start trading at £100 each, it may be worth selling a small ISP that has 100,000 IPv4s? No plans to, but what if they get to £1000 each?
Scams... We have not seen what the world of fraud will come up with! I am not sure I can even speculate on that. There will be scams of all sorts. The registries have made changes to make it hard to hijack IP space or sell it when it is not yours, but that won't stop the scams...
Hi everyone,
I’m looking for some feedback from you all on static analysis tools today and whether you can implement these tools without breaking the bank. If you follow me on Twitter you might have noticed that I’ve started a vendor selection process for a static analysis tool to compliment our manual processes here at Realex Payments. I know these tools aren’t cheap but I had a budget in mind which I thought would get us a solution that meets most of our requirements – boy was I wrong! I’m being realistic with our requirements, I can’t justify the cost of implementing an all singing all dancing, integrated into “everything” static and dynamic analysis solution but the figures I had in my mind would make the static analysis solution the largest application security expense for the company after application security resource salaries.
Are commercial static analysis solutions only affordable by consultancies and large software companies and are they actually worth that much money? I’m not trying to cause an argument or a flame war; I’m genuinely interested in people’s opinions on this.
We have a very strong SDLC with security ingrained throughout and this includes the use of some open source/free static analysis tools such as Findbugs, RIPS and CodePro AnalytiX to compliment our manual reviews. We also use in house tools such as Agnitio and have a commitment to conduct manual reviews of all code changes before they are deployed into production and we are currently trying to hire another application security analyst to ensure the manual review process continues to be executed twice per project.
So now I’m looking for opinions from you all on what you have done from a static analysis point of view. What has worked for you? Are these commercial tools genuinely worth the money and do the downsides (false positive for example) outweigh the positives in your experience? What vendors have you used and would you recommend them to another company, if yes why? What do we gain from buying an expensive commercial solution versus buying cheaper solutions such as Klocwork Solo to use in conjunction with the tools mentioned above and expanding in house tools such as Agnitio to include automated code crawling capabilities?
On a slightly different note I wonder if the cost of these application security solutions has a bearing on the low application security spend pointed out by Gunnar Peterson and Jeremiah Grossman recently? If I’m a manager/CxO looking at the cost of these solutions I’d be expecting miracles from them, in fact I’d probably point to the (much) lower cost of our network security solutions and ask why there is such a big difference! If the manager/CxO can hire 4 or 5 members of staff for the cost of an application security solution guess where the money is going to go……
I’m looking forward to hearing what you all think about this!
SN
The “slow query log” is the single most valuable way to examine query execution on your MySQL server. Queries are logged with timing information, and in the case of Percona Server, a great deal of additional performance and other diagnostic information. But the execution time recorded in the log is the time the query took on the server, and the client that sent the query to the server might see something quite different. Sometimes it’s valuable to be able to see both views of query execution time.
Why would you want to see the query timing information from the application server or other client program? I have run into a handful of scenarios where this was desirable. For example, sometimes there is no access to the database server. I’ve seen this when access was forbidden by management, and when a server was so overloaded that nobody could get into it if they wanted to. Another reason for logging from the client is to log only selected traffic that is of interest, instead of the potentially large and hard-to-analyze amount of traffic that the server might receive in total, much of which might have nothing to do with the task you’re trying to optimize or analyze. You might also be interested in gathering data that you can’t gather on the server with the slow query log, such as information in the TCP protocol that isn’t written into the standard server’s log. Finally, you might be interested in logging the full round-trip time, including network delay, as seen by the application server.
There are several techniques for logging queries with their execution times. I will list three.
One is to make the application time the queries and write a query log. I suggest writing out something in a standard logging format such as the slow-query-log format, so you can analyze it easily with tools such as Maatkit’s mk-query-digest. These tools can do a lot of work for you. Or, you can log queries to a database table and then use SQL to analyze the queries; but this is quite a bit harder, because similar queries need to be “fingerprinted,” which is impractical or impossible in SQL. The main drawback to making the application itself log the queries is that it might not log everything, such as superfluous ‘ping’ commands.
Another option is to make the client connector log the queries. For example, if you are using PHP with mysqlnd, you can create a plugin that logs the queries, such as the query logging demonstrated here. This is likely to be easier and more reliable than changing the application code itself, especially if queries don’t go through a single point in the code but are issued all over the place.
Finally, you can capture TCP traffic. There are several ways to do this with varying levels of accuracy and completeness; you can use tcpdump and some quick shell commands, for example. But with mk-query-digest’s built-in ability to decode the MySQL wire protocol, it is certainly the easiest and best solution for most needs. Just use tcpdump and the “–type=tcpdump” option to mk-query-digest. You can even make it print out a “slow query log file” format with the “–print” option.

Piccolo (not this or this) is a system for distributed computing, Piccolo is a new data-centric programming model for writing parallel in-memory applications in data centers. Unlike existing data-flow models, Piccolo allows computation running on different machines to share distributed, mutable state via a key-value table interface. Traditional data-centric models (such as Hadoop) which present the user a single object at a time to operate on, Piccolo exposes a global table interface which is available to all parts of the computation simultaneously. This allows users to specify programs in an intuitive manner very similar to that of writing programs for a single machine.
Using an in-memory key-value store is a very different approach from the canonical map-reduce, which is based on using distributed file systems. The results are impressive:
Experiments have shown that Piccolo is fast and pro-vides excellent scaling for many applications. The performance of PageRank and k-means on Piccolo is 11×and 4× faster than that of Hadoop. Computing a PageR-ank iteration for a 1 billion-page web graph takes only 70 seconds on 100 EC2 instances. Our distributed webcrawler can easily saturate a 100 Mbps internet uplink when running on 12 machines.
Piccolo was presented at OSDI10. For the paper take a look at Piccolo: Building Fast, Distributed Programs with Partitioned Tables, here's the slide deck, and there's a video of the talk (very good).
We’re down to less than 25 tickets remaining for Percona Live. If you are planning on attending, it might be a good idea to purchase your ticket(s) as soon as possible – we are expecting to sell out.
One of the most common uses for a javascript library is to handle a “hover” or “mouseover” event. Often you might want to highlight some element on the screen when you hover over it, or trigger some display changes and “undo” them when you mouse out. You often see this with web based applications, where the idea of hovering over an element might show additional tools or allow you to “retweet” some post, etc.
Regardless of the details, you may find that your first try at “mouseovers” doesn’t behave the way you expect. Here’s an example:
Click to the “result” tab and mouse over the first set of elements. Look at the mouseover/mouseout counts. Move your mouse around inside one of the green boxes. Look at all those “extra” events. Part of this has to do with event bubbling and such, but for now let’s just assume this is not your desired behavior. Perhaps you do some logging of events, and you don’t want all that extra noise.
YUI has a neat trick to avoid all this – and is similar in a way to jQuery in this respect. You can choose to change your code slightly to use some non-standard (but often used) events called mouseenter and mouseleave. These events will only be triggered when the mouse enters the bounding box of the element, or leaves the bounding box – regardless of the contents. Compare the code below with above in terms of the number of “mouseouts” and “mouseenters” :
See how much less “noisy” it is? It also makes a bit more sense code-wise, because you really don’t care if the mouse moves over other elements inside the box – effectively triggering a mouseout/mouseover combo, you only care if the mouse moves OUTSIDE of your particular box.
So, long story short — if you want to have less “noisy” code (and more performant) and accept having a bit of an abstraction in terms of events, then I suggest you use the “mouseenter” and “mouseeleave” events instead of the more traditional “mouseover” and “mouseout” — simply because they are closer to what you really meant for the code to do.
If you want to read more about this, go over to the YUI Docs page.
NOTE: you can simplify your code even more with “hover”, which takes two methods – one for the “enter” and one for the “exit” . Read more over at the YUI Doc site.
It’s great to get recognized in whatever industry your company is in, and at Rackspace, we want to be the leader in hosting and cloud computing. One of the many places that track the cloud race is jackofallclouds.com, which started tracking websites hosted on the top cloud providers back in July 2009.
In a recent post, jackofallclouds.com had a snapshot of the top 500,000 sites hosted by the top cloud providers. Rackspace Cloud Servers has been gaining on Amazon EC2 for some time and as of this month, it is truly neck-and-neck. Amazon EC2 is at 3,674 sites and Rackspace is at 3,662 sites. Third place was more than 50% behind Rackspace and Amazon. The graph points out a tight run between Rackspace and Amazon to be the top cloud host provider.

Indeed, the article goes on to say,
“Rackspace’s numbers look better than ever – they can be proud to be just a hair away from the top spot. For all intents and purposes, in this survey Amazon EC2 and the Rackspace Cloud are tied at first place.”
While no one likes to be second place, we consider this a big win for Rackspace. It’s also worth noting that Jack of all Clouds’s numbers do not include Rackspace Cloud Sites or websites hosted on our dedicated servers.
Rackspace is becoming the company to turn to for mission critical cloud hosting, and it shows: more companies are choosing Rackspace Cloud Servers to host their websites . We are proud to be doing so well against Amazon, a great company that we respect. We’ll keep watching Jack of all Clouds in the coming months to see what happens.
If you weren’t able to make it to the FogBugz/Kiln world tour, a video of my presentation is up now on YouTube.
(If you have a high bandwidth connection, try the “720p” option, which shows the screen more clearly.)
Need to hire a really great programmer? Want a job that doesn't drive you crazy? Visit the Joel on Software Job Board: Great software jobs, great people.
I wish there were a better to track changes in InnoDB. Right now my tools are luck and recursive diff.
You've heard about PHP namespaces by now. Most likely, you've heard about -- and likely participated in -- the bikeshedding surrounding the selection of the namespace separator.
Regardless of your thoughts on the namespace separator, or how namespaces may or may not work in other languages, I submit to you several reasons for why I think namespaces in PHP are a positive addition to the language.
Continue reading "Why PHP Namespaces Matter"
Rackspace Cloud Files is proud to announce that we will no longer be charging for requests, including Put, Post, and List requests – that’s right, all requests
are now FREE! At Rackspace, we’re committed to cutting our costs so we can cut your price, and that’s exactly what’s happened.
We know that when you’re storing and interacting with large amounts of data, requests can add up to a lot of money over time. We have already put this change into effect, so any bills generated after this point will not have charges for requests.
Cloud Files has lot of great things coming for our customers this year, and reducing your price is just the beginning.
Enjoy the savings and stay tuned to the blog for upcoming product news!
Last summer Kimberly and I were lucky enough to hire Brent Ozar (blog |twitter) when he left Quest. It's been a blast having Brent work for us and he's a great friend and a fun guy, even if he is a Mac addict :-)
At the start of the year it became apparent that Brent has some amazing marketing ideas. However, putting a bunch of unpaid time into taking SQLskills.com to the next level while a consulting employee didn't make sense. So Kimberly and I decided to bring Brent into the fold as a full partner and owner, effective immediately. This works so perfectly as we all get to manage the part of the business we're most happy and comfortable doing - Brent the marketing, sponsorship and business development, Kimberly producing all our events, and me managing our clients and consultants - while all still consulting, training, and staying technical.
You might wonder as to our sanity at giving Brent an ownership stake in the company, but you have no idea how valuable Brent's skills are, and how much none of the three of us want to do what the others like doing - it makes the most sense to team up for good.
But of course, we couldn't just announce this - we had to pretend Brent was leaving this morning on Twitter to have some fun with everyone :-)
In Brent's blog post, he wraps up by saying "This feels like the biggest risk I’ve ever taken, and yet the safest bet in the world". I couldn't agree more.
Exciting times ahead!
PS Check out Brent's post abouut the change here.
In my survey for this week I'm interested in what you look for first when analyzing a query plan.
I'll report on the results around mid-February.
em24_survey_x = "e411c631-9007-4d3a-b065-df17d0ebe1c5"; em24_survey_width = "255"; em24_survey_height = "333"; em24_styles = ".em24_s {border:solid 1px #626A84; width:250px;} .em24_s td {font-size:12px;} .em24_q {background:#798BC6; color:#ffffff;} .em24_ai0, .em24_at0 {background:#E7F3FF; border-top:1px solid #B3C7D9;} .em24_ai1, .em24_at1 {background:#E3F7DE; border-top:1px solid #B3C7D9;} .em24_v {background:#798BC6}";
Thanks!
PS Post comments are disabled to avoid skewing the results.
Lean Startup Machine came to San Francisco a few weeks ago, after successful debut runs in New York and Chicago. I’ve had the privilege of being a judge in previous weekends, but this is the first time I was able to attend in person.
When I first heard about Lean Startup Machine, I’ll admit I was skeptical. In my consulting practice, I struggle to get teams to do validated learning when they have months or years of runway. How much learning could a team do in only 48 hours, with no budget whatsoever?
But the results were amazing. Out of eleven teams, the three finalists we picked all had discovered honest-to-goodness viable businesses. Each had pivoted several times and had built one or more MVPs. One even had $100 in revenue (in escrow) from real live customers. Beyond the three finalists another few teams had made solid discoveries and would probably have been contenders if they’d just had a little more time. I bet more than half the teams wish they had just one more day – and could have achieved something great in that time.
Think about that for a second. If only they had one more day. Think how valuable a single day is, when used to its maximum potential. And now think how casually we throw a day of work away, when it’s just one tiny part of a huge batch, as in a monthly release cycle.
And that brings me to the fuzzy math that forms the title of this post: a month is fifteen weekends. Normally, we consider a month-long development cycle to be a short time. We can barely get anything done. We feel rushed. We are too busy to learn anything. We think we’re acting fast. The Lean Startup Machine puts the lie to that idea. I forced me to think about how much validated learning these teams could accomplish if they had a whole month - fifteen times longer - to work.
It wasn’t just that the winning teams were so productive. It was the wide disparity between the most and least productive teams. Remember, everyone had the same amount of time. Most teams were about the same size. And every team started with a promising idea (which were chosen by voting on the first day). But some teams managed to get only a single landing page-style MVP done over the course of the whole weekend. Some managed little more than a survey of customers, or a few conversations. Their learning was decidedly un-validated: stories, anecdotes, maybe some strong thinking at the whiteboard.
Talking to attendees afterwards, I got the sense that these laggards learned the most from the weekend. I bet when those teams disperse and go back to their day jobs (many of whom work at startups), they will have the biggest impact on their teams and colleagues. Because they saw, first hand, just how hard it is to make progress, even with a great team and great idea, if you’re not flying through the Build-Measure-Learn feedback loop.
For example, one team managed to put together a very decent looking minimum viable product, in the form of a landing page with a “click to signup button” that basically did nothing but collect data about who was clicking. And their MVP even had a reasonably high click rate. But is a 25% click rate validation of the idea? That depends on who’s clicking and why. Do they understand the product? Are they eager to try it, or were they just enticed by the shiny button? Unfortunately, the team had no way of answering these questions. They weren’t even collecting contact information from these first customers. They were just counting clicks.
I don’t mean to pick on this team. After all, I can easily imagine how they got in this situation. They were trying to find the minimum viable product, after all. I can almost hear the argument in my imagination: “guys, let’s just ship this thing and see what happens – we can always add contact fields later.” This is a classic startup fallacy: “ship it and see what happens.” Whenever you use this plan, you are guaranteed to succeed – at seeing what happens. Unfortunately, if you cannot fail, you cannot learn. And so this team had succeeded in building and shipping something; they even had some measuring. But they could not learn, and – most importantly – they could not get another turn through the feedback loop. Remember, an MVP is not an event in itself; it is the beginning of a process. (Dare I say it, "you still have to iterate?")
I don’t know how many real companies will come out of a weekend. After all, the scientific method makes it easier to disprove a bad idea than “prove” a good idea. In one notable case, a team was able to conclusively invalidate a business that I have been pitched by venture-backed entrepreneurs many times – with a full day to spare. Compared to entrepreneurs who’ve blown millions of dollars pursuing the same vision, this is a way better outcome. Since they had extra time, they tried a pivot into a much more promising idea. By the time of the judging, they had an MVP in the market with real customers signed up. Will this new idea ultimately prove profitable? Only time will tell.
Regardless, I’m a believer in Lean Startup Machine because of its educational impact. Reading is good. Going to conferences is good. Watching videos is good. But taking action is better. Lean Startup Machine offers everyone the opportunity to learn some of our key tenets, by immersion, rather than by rote: getting out of the building, pivoting, minimum viable product, and validated learning. In real life, the most comfortable thing is to stay in the building, leave assumptions untested, and hope for the best. LSM forces participants to go outside the comfort zone, or fail.
I was surprised by the number of people who attend Lean Startup Machine who already work in a startup full-time. They are not there to learn if their current startup idea is good or bad. They are there to learn a new, faster process that they can bring back to their company.
As one participant wrote,
If you’re reading this right now, and wondering if maybe your company could go a little faster, pay attention to Lean Startup Machine. It may be coming soon to a city near you. The next LSM is scheduled for Boston on Feb 25, 2011. Early bird registration just started, and it’s sure to sell out."The Lean Startup Machine was a synthesis of a lot of my frustrations with how the companies I've worked for over the past few years have created products, and showed me that a much better way was possible. A hugely eye opening experience for me, and I had a blast working with really smart, energetic people."
Percona was working closely with Virident on evaluating tachIOn as solution for MySQL, and as result you can find whitepaper “Scaling MySQL Deployments Efficiently Using Virident tachIOn Drives”, available from Virident website. It was done as part of our consulting practice, but all results and numbers are certified by Percona. I personally really enjoyed performance and stability provided by tachIOn card, and you actually saw bunch of benchmarks results published on our blog which proves it.
Virident tachIOn will be on the list of our recommendations for customers looking to improve performance or building high-performance solution based on MySQL.
Focus is putting together a Cloud Computing roundtable with Ben Kepes as moderator and panelists, Christian Reilly, Jim Battenberg (Product Marketing, Rackspace Cloud) and John Taschek (Strategy, Salesforce.com).
Topics include:
1) Is hybrid cloud merely a stepping stone?
2) Will the world really be completely cloudy one day?
3) When will we stop talking about reduced costs?
WHEN:February 8, 2011 11am PT/ 2pm ET
WHERE:
Toll-free Dial-In Number: (866) 951-1151
International Dial-In Number: (201) 590-2255
Conference # : 4999006
To follow the conversation and to submit your own questions use keyword FocusCloudRT on Focus.com and #FocusRT on Twitter.
I have a new deadline to beat for this post. Wednesday February 16, 2011 6 pm, that's my new deadline. That’s when we hold the next HUG at the Yahoo! Sunnyvale Campus at the URL’s Café. BTW, in case you are looking for it, here’s the campus in more details: http://www.wikimapia.org/#lat=37.4181633&lon=-122.0250607&z=18&l=0&m=b&search=yahoo
Why is that my new deadline? Mostly, because I missed my old deadline which was a commitment to the 200 or so Hadoopers that attended the January HUG session that the presentations from the January HUG would be made available soon on the Hadoop blog on YDN, here: http://developer.yahoo.com/blogs/hadoop/
If I don’t beat this new deadline, I have this picture in my mind of the same 200 Hadoopers that showed up at the January HUG showing up at the February HUG surrounding me to remind me of my commitment to share the slides from the January HUG.
So, here are the Wednesday January 19, 2011 HUG presentations in order of appearance:
New features in Pig 0.8: Pig 0.8 focused on extending Pig's usability. We added the ability to write UDFs in scripting languages like Python, gave users better access to statistics, and created PigUnit to help users test their Pig Latin, to name only a few. Of course we continued to improve performance too by enabling compression of intermediate results and collecting together small blocks into a single mapper. We'll cover these and more in this overview of Pig 0.8s new features plus talk about what we're working on now for 0.9. Presenter: Alan Gates, Yahoo!
Kafka: LinkedIn's Real-time Data Stream System: Kafka is a distributed, real-time, persistent messaging system developed at LinkedIn. It supports horizontally distributing message production, brokering and consumption over commodity machines. This system serves as the backbone of LinkedIn's log aggregation and activity processing system, providing data feeds for Hadoop as well as real-time consumers. Presenter: Jay Kreps, LinkedIn
Howl: Table Management Service for Hadoop: Howl is a project that aims at providing a table management service. Data processors using Hadoop have a common need for table management services. The goal of this service is to track data that exists in a Hadoop grid and present that data to users in a tabular format. The table management service will present data in an uniform format to all tools like Map Reduce, Streaming, Pig, and Hive, by providing interfaces to each of these data processing tools. Presenter: Devaraj Das, Yahoo!
If you haven’t yet signed up for the February HUG, you can sign up here:
http://www.meetup.com/hadoop/events/16116942/
Looking forward to meeting you at the February HUG.
Scalr is software that auto-scales web applications by provisioning and configuring resources like servers, storage, and IP addresses. These resources are
taken from the cloud, like EC2 and now Rackspace Cloud.
Hi folks, I’m Sebastian, co-founder of Scalr, which is part of the Rackspace Cloud Tools program, available for use with Cloud Servers.
When we started Scalr, all we wanted it to do was provide auto-scaling on EC2. This was to be done in simple form: load goes up, add more servers. Load goes down, remove a few.
But then we discovered that there was more to auto-scaling than just adding more servers of the same type. We needed to make sure that load was balanced properly. That no server termination caused users to be cut off. That dysfunctional servers were replaced. That servers were always kept 55 minutes into the hour. And for databases, that data was replicated between nodes.
We also discovered that an intuitive, fast UI was important for our users to get their job done. Same for our hosted DNS. Same for our service config tool. And as we make Scalr the best auto-scaling platform out there, people have knocked on our door asking for us to add auto-scaling to Rackspace Cloud. Over, and over, again. So we did.
Let me say that Rackspace Cloud is a great home to any web app. Not only is it close to feature parity with EC2, it also has some useful abilities of its own, like server Resizing. Server resizing is great for a particular use case: scaling your master database to handle more write transactions. This avoids all sorts of brittle fancywork, and provides a simple solution to the single largest bottleneck in web apps today.
You also get Fanatical Support, the fitting name Rackspace gives to its support team. This is very useful when using Cloud infrastructure for the first time.
Oh, and by the way; since we started working on supporting Rackspace Cloud, we’ve joined OpenStack under the sexy name openstack-platform-php. Yay!
For more information, check out scalr.net or register to attend this webinar. Documentation is found at wiki.scalr.net, and source code at code.google.com/p/scalr. Enjoy!
Click here to register for webinar.
I had a very pleasant phone call with Erica and Daniel of BitNami just yesterday. They gave me a complete briefing on their new BitNami Cloud Hosting product, which they just announced a few minutes ago.
BitNami Cloud Hosting is designed to make life easier for solution providers. The goal is to provide solution providers with the ability to easily launch, manage, and maintain a number of popular application and development stacks for their customers. Supported applications include:
- SugarCRM (customer relationship management).
- Redmine (project management).
- JasperServer (reporting).
- Alfresco WordPress, ezPublish, Joomla, ocPortal, Typo, and Drupal (blogging and content management).
- Coppermine and Gallery (photo galleries).
- Dokuwiki (documentation wiki).
- Moodle (learning and course management).
- Mantis (bug tracking).
- phpBB (bulletin board).
- Spree (e-commerce).
- Tracks (GTD tracker).
You can deploy one or more applications on a single EC2 instance. Let's take a tour.
The first step is to create an account. You need to enter your AWS credentials here. Because this product is aimed at solution providers, you can even enter credentials for more than one AWS account. You could also enter AWS user credentials (created via AWS Identity and Access Management here if you'd like):
You can see all of your servers along with estimated monthly and hourly costs:
The next step is to create a server and choose the applications to run on it (you can run several applications on one EC2 server if you'd like):

You can choose between 32 and 64 bit versions of Ubuntu Linux, choose an EC2 instance type, and configure the desired amount of disk space (this step is optional and defaults are supplied). Monthly costs are displayed for each option:
The server is up and running in a few minutes. Once running, it is easy to make backups (either impromptu or scheduled), assign an Elastic IP address, or to check the status of the server:


You can set up a schedule to run the server only at certain times of day, on certain days of the week:
Each server that you launch via BitNami provides access to your applications using a customized entry portal
You can easily move your applications to a larger or smaller EC2 instance or adjust the amount of disk space that they available (Erica told me that they'll allow server resizing to be schedule-driven in the near future, giving you the ability to run a large server during the day and a small one at night):
BitNami Cloud Hosting is available as a paid service (see the pricing plans) after a 30 day free trial. If you sign up before midnight on February 12, 2011, you'll get 3 months of free service. You'll pay the usual rates for your use of EC2 and EBS. If you are eligible for the AWS free usage tier you can use the free BitNami trial to get started without incurring any charges.
Erica and Daniel told me that they are working hard to add additional features to BitNami Cloud Hosting, including scheduled server resizing, support for regions other than US-East, additional Linux distributions, Microsoft Windows, and notifications.
You may also want to watch Erica's Bitnami Cloud Hosting Tour video.
-- Jeff;
Last week I kicked off a survey about network latencies and database mirroring. Seehere for the original post.
Here are the results of the survey:

I was really interested to see whether the proportion of people doing asynchronous mirroring became higher as the network latency increased. Although this isn't a statistically valid sampe by any means, it does show that the answer is no. However, we're missing some data that would help explain what we see here: how long are the average transactions and is there a response time SLA?
The latency between the principal and the mirror is a big deal for synchronous mirroring, because a transaction on the principal cannot be acknowledged to the user/app as having committed until all of it's log records have been written to the mirror database's log drive.
NOTE: the transaction does NOT have to be replayed/committed on the mirror, simply the log records have to be durable to guarantee the transaction is durable if the principal has a disaster. This is a very common misconception.
If the average transaction length is quite long, say 20 seconds, then the addition of another 500ms of latency when the commit is issued is not a big deal. But if the average transaction length is 100ms then an extra 500ms is a *very* big deal. This is when using asynchronous mirroring starts being considered - as transactions on the principal do NOT have to wait, but at the expense of potential data loss if the principal experiences a disaster. However, if there is no response time SLA, then the company may be fine with the extra delay with synchronous mirroring to guaranteezero data loss (as long as the mirror session stays SYNCHRONIZED).
As always, the choice of HA and DR technology comes down to analyzing requirements and limitations before choosing a technology. I go into this in more detail in the whitepaper I wrote in 2009 for Microsoft:High Availability with SQL Server 2008. There is also an excellent whitepaper on database mirroring:Database Mirroring Best Practices and Performance Considerations.
If you're one of the people who responded that you don't know your network latency even though you're using mirroring, check out the post I wrote last week:Importance of monitoring a database mirroring session.
Thanks!
Given this compromise, I'd like to point out one of my favorite topics in application security — password hashing.

It's all too common that Web (and other) applications use MD5, SHA1, or SHA-256 to hash user passwords, and more enlightened developers even salt the password. And over the years I've seen heated discussions on just how salt values should be generated and on how long they should be.
Unfortunately in most cases people overlook the fact that MD and SHA hash families are designed for computational speed, and the quality of your salt values doesn't really matter when an attacker has gained full control, as happened with rootkit.com. When an attacker has root access, they will get your passwords, salt, and the code that you use to verify the passwords.
And this is the assumption any security design should be based on; an attacker has access to everything that is on the server.
Salt is primarily intended to prevent precomputed attacks, also known as rainbow tables. And a common assumption has been that as long as precomputed attacks are prevented, passwords are relatively safe even if attacker would get the salt value along with user password.
But MD and SHA hash variants have been designed for computational speed, which means that an attacker can easily get billions of brute force attempts per second when using a video graphics display card for processing.
See: http://www.golubev.com/hashgpu.htm
Which means that even with single ATI HD 5970, an attacker can cover password space equivalent to a typical rainbow table (2^52.5 hashes) in 33 days. And it's a safe bet that a serious attacker will have more than one card for the job.
When an attacker has your salt values and code, the only thing that is protecting user accounts is the strength of passwords they are using, and people are not very good sources of entropy. By combining dictionary attack and brute force techniques it will not take very long to break a significant proportion of passwords, even for a large site with many accounts.
So what should be done to avoid this?
The first thing to consider is that passwords are very much like safes in the real world, what matters is not only the length of the code needed to open the safe that protects the contents, but also how long each attempt takes.
This clearly means that SHA1 or any other plain hash algorithm is clearly a no go for secure password authentication.
What you want to use is something that will not be trivial to brute force. Instead of doing 2300 million attempts per second, you want something that limits an attacker to 10,000 or 100,000 attempts per second.
And while using salt values is vital to proper implementation, it is not a silver bullet which will make your problem go away.
This requires a password hashing scheme that fulfills the following properties:
• Computational time required can be adjusted easily when processing power increases
• Each user can have unique number of iterations
• Each user hash is unique so that it is impossible to find out if two users have same password by comparing hashes
There are several such schemes to choose from:
• PBKDF2 http://en.wikipedia.org/wiki/PBKDF2
• Bcrypt http://www.openwall.com/crypt/
• PBMAC http://www.rsa.com/rsalabs/node.asp?id=2127
Each of the alternatives has their strengths and weaknesses, but all of them are far stronger than general purpose hash implementations such as SHA1+salt.
So if you are working with passwords, pick one of the schemes above, determine the number of iterations it takes your server check the password for the desired length of time (10, 200ms, et cetera) and use that. Have a unique salt value and iteration count for each user — anything that forces the attacker to focus on each account separately rather than being able to try against all accounts on each iteration.
Updated to add: I had accidentally put HMAC on the list of suggested schemes when I should have put PBMAC. Props to Matthew and Sagres for pointing out the error.
On 09/02/11 At 08:39 AM
For this post, I am turning the keyboard over to my colleague Chris Wheeler. He's the Technical Program Manager for the Amazon Simple Email Service...
A few weeks have passed and the dust has settled a bit. If you haven’t heard, Amazon Simple Email Service launched as a new highly scalable and cost-effective marketing and transactional email service. We thank all the folks who’ve tried the service out and are successfully sending their email through SES. The fact that several other email application providers have begun building on top of the Amazon SES stack further shows how simple and flexible the service really is. Thousands have subscribed to Amazon SES, and many have sent significant quantities of email already. The Amazon SES forum has been popping.
Let me take a brief moment to remind everyone of the process for sending email through Amazon SES. Subscribe to Amazon SES by clicking here. You’ll need a valid AWS account if you don’t already have one. You’ll be placed in the sandbox where you can test the service and get familiar with the technical requirements. To get production access, click here and you’ll be taken to an application you can fill out in seconds. After review, you will receive an email from Amazon SES when you’ve been approved. You can also use the GetSendQuota API to check your sending limits at any time. Once in production, you can begin increasing your mail volume over time as you continue to send high quality mail.
Recommended migration strategies for sending email with SES can be found here. Finally, if you find you need to send more mail than the initial Amazon SES quota allows, click here and we’ll be happy to work with you.
Finally, if you haven’t already checked out Amazon SES, we welcome you to do so in a live presentation I’ll be hosting. Sign up today for the AWS webinar, Introduction to Amazon Simple Email Service on Friday, February 11, 2011 10:00 AM - 11:00 AM PST. Register here.
Here are some quick reference links for more information as well:
This post was composed by Issac Roth, Director of Product Marketing – Cloud Solutions at Red Hat
Red Hat’s Makara team is pleased to announce the availability of the Makara Cloud Application Platform for use with The Rackspace Cloud.
Makara is a Platform-as-a-Service that allows you to deploy, manage, monitor and auto-scale your new or existing Java and PHP applications in the cloud with little or no modifications.
You can learn more about how Makara works by going checking out our resources page for videos, tutorials and How-To guides.
Makara has been supporting Amazon’s public cloud for almost a year now, but we kept hearing from devops folks and app developers that they would love to see Makara available on The Rackspace Cloud. Ask and you shall receive!
Getting started with Makara on The Rackspace Cloud is easy. Just head on over to the Try-It linkon the Makara site to get your trial started.
We are also hosting a webinar on March 17, 2011 at 2PM CST where I’ll be demonstrating how to deploy, manage, monitor and scale your Java or PHP app on The Rackspace Cloud, plus cover more of the features available with Makara.
Click here to register for the live webinar.
Finally, as many of you know, Makara was acquired by Red Hat in November 2010. Under Red Hat, we are providing the current version of the Makara Cloud Application Platform as a FREE Developer Preview! This Developer Preview is unsupported, though we will participate in community forums and IRC to offer advice and discussion. Future versions will be incorporated into the upcoming Red Hat Platform as a Service offering — stay tuned for more exciting news on this! For more information concerning Red Hat’s PaaS and Cloud Foundations offerings, visit the Cloud Foundations website.
If you still have questions, drop us a line, we’d love to hear from you!
Many developers may not be getting enough out of the database tools they currently use. Several others are paying too much for the ones that do deliver. PremiumSoft is here to combat this problem with Navicat, a unique and easy-to-use database administration tool.
What is Navicat?
Navicat is a highly rated database tool designed to meet the needs of database administrators and developers who need a fast, reliable and affordable solution to their needs. Its GUI allows users to create, organize and share information in a secure and easy manner.
With its powerful, intuitive and user-friendly GUI users can manipulate data with peace of mind and not have to worry about the long-winded learning process that the other tools out there require you to endure.
Flexibility is also an important characteristic that not many tools have in common. However, Navicat allows for complete flexibility. It is available for several server types including MS SQL Server, MySQL, PostgreSQL, SQLite and Oracle. It supports Windows, Mac OS-X and Linux operating systems. Our flagship product, Navicat Premium, gives users complete flexibility and ease of use by allowing connections to multiple database servers within a single application.
How can Rackspace clients use Navicat for their needs?
Rackspace users frequently use either MySQL or Microsoft SQL Server databases, or sometimes both. Clients can use Navicat to perform complex operations such as batch-job scheduling, generating reports, data synchronization, importing/exporting data, visual query editing and much more. Navicat is a great tool for performing complex tasks related to database design and data manipulation.
Getting Started is so easy too!
To start working with Navicat, the first thing you will need to do is establish a connection to your databases.
Next you would just enter the Rackspace MySQL connection information; such as host IP address, port, username and password. Click Test Connection to check the connection and then click OK to save the connection settings.
From the control panel you can see the databases that sit under the account. Now, you can start enjoying the great features that Navicat has to offer. It’s just that easy.
Has anyone noticed that pretty much every puppet module one finds on the internet by default enables the service they try to configure in the module
When looking at it from a single machine point of view it makes sense to include the module , have it configure your service and directly enable it by default.
So I started wondering .. isn't there anybody out there who is building clusters ? Where services have to configured on multiple nodes but should NOT be running acitvely on all nodes by default because there is an external tool which manages that for you (Pacemaker framework eg.)
Agreed it's a small patch to get the functionality you want , but it brings an extra overhead when one upgrades the modules etc.
So if it doesn't bother you please split your puppet module in 2 parts.. one you call to configure the service, another which you call to enable the service , if you want to.
thnx !
-->Trackback URL for this post:
The performance problems caused by battery auto learning go many years back. We wrote about it, other people from MySQL Community too. The situation did not get better, at least not with Dell RAID controllers, H700 and H800 have the same problem too. At the same time situation got worse as a lot more people are running Innodb in full durability mode which is dramatically affected by this setting.
First I should wonder how common this problem is outside of Dell product line ? (Which is using LSI chips) Are there RAID controllers which do not have this problem ? For many installations it would make sense to pay few hundred dollars more per server just to avoid nightmare of scheduling learning cycles.
I’m surprised it takes so many years to do it. Can’t one use capacitor instead and bundle 512MB cache with 512MB of Flash so when power goes down the cache is stored on Flash ? Or can’t one put batteries which can be moved independently ?
It looks like H800 has “transportable non volatile cache (TVNC) as an option but it does not look like it.
As the problem is still there what can you do about it ? First Test it. You can trigger learn cycle by disabling auto-learn and when triggering learning either by MegaCLI or by Open Management tools (see this for example). You will see for how long battery cache gets disabled in your system (it is only part of all learn phase). You can also shift cache mode in Write Through if you do not have very long time for testing. I recommend this testing as part of complex IO subsystem performance testing – if you have RAID check what performance is going to be with failed hard drive (and during rebuild stage), what overhead LVM takes for backup etc. It may be performance drop is not such a bad issue for you so you can just take it during the night or you might need to do something such as getting Slave out of rotation when it is going through the process.
Second. Schedule it. Most systems would be much better with scheduled learning during the night or weekend, where it can be done on different servers at different times with team informed about slower performance than catching everyone by surprise (at least first time).
Third you may chose to compromise on ACID during such period of times. RAID gives an option to force write back even with no battery which will likely trash your database if power goes down during learning process. It may be fine for your data if not you may be able to get less penalty going from innodb_flush_log_at_trx_commit=1and sync_binlog=1 to values 2 and 0 appropriately. Both can be done without server restart with Innodb Plugin, Percona Server and MySQL 5.5. Note it might also be good to increase innodb_write_io_threads
to get more outstanding requests – without cache it matters a lot for writes. This is a lot better than forcing write cache without Battery as database should not get corrupted in case of bad crash timing, though you may lose some uncommitted transactions and binlog may get out of sync with Innodb transaction logs.
I’m also wondering if this is something where Facebook Flash Cache can be helpful – if it can act instead of hardware BBU cache. Would be interesting to test.
by
Arun C. Murthy
Lead, Hadoop Map-Reduce Development Team, Yahoo
This blog post describes the Capacity Scheduler, a pluggable MapReduce scheduler for Apache Hadoop, which allows for multiple-tenants to securely share a large cluster such that their applications are allocated resources in a timely manner under constraints of allocated capacities.
We have developed and deployed the Capacity Scheduler on over 40,000 Hadoop machines at Yahoo since 2008.
Please note that some of the features described in this post are currently available only in the Apache Hadoop0.20-security branch and we are feverishly working to port to Apache Hadoop trunk as part of ourDoubling Down on Apache effort. Folks are welcome to try therelease artifacts from this branch and provide us feedback – note that these artifacts are not an official Apache release yet.
The Capacity Scheduler is designed to run Hadoop Map-Reduce as a shared, multi-tenant cluster in an operator-friendly manner while maximizing the throughput and the utilization of the cluster while running Map-Reduce applications.
Traditionally each organization has it own private set of compute resources that have sufficient capacity to meet the organization's SLA under peak or near peak conditions. This generally leads to poor average utilization and the overhead of managing multiple independent clusters, one per each organization. Sharing clusters between organizations is a cost-effective manner of running large Hadoop installations since this allows them to reap benefits of economies of scale without creating private clusters. However, organizations are concerned about sharing a cluster because they are worried about others using the resources that are critical for their SLAs.
The Capacity Scheduler is designed to allow sharing a large cluster while giving each organization a minimum capacity guarantee. The central idea is that the available resources in the Hadoop Map-Reduce cluster are partitioned among multiple organizations that collectively fund the cluster based on computing needs. There is an added benefit that an organization can access any excess capacity not being used by others. This provides elasticity for the organizations in a cost-effective manner.
Sharing clusters across organizations necessitates strong support for multi-tenancy since each organization must be guaranteed capacity and safeguards to ensure the shared cluster is impervious to single rouge job or user. The Capacity Scheduler provides a stringent set of limits to ensure that a single job or user or queue cannot consume a disproportionate amount of resources in the cluster. Also, the JobTracker of the cluster, in particular, is a precious resource and the Capacity Scheduler provides limits on initialized/pending tasks and jobs from a single user and queue to ensure fairness and stability of the cluster.
The primary abstraction provided by the Capacity Scheduler is the concept of queues. These queues are typically setup by administrators to reflect the economics of the shared cluster.
The Capacity Scheduler supports the following features:
- Capacity Guarantees - Support for multiple queues, where a job is submitted to a queue. Queues are allocated a fraction of the capacity of the grid in the sense that a certain capacity of resources will be at their disposal. All jobs submitted to a queue will have access to the capacity allocated to the queue. Administrators can configure soft limits and optional hard limits on the capacity allocated to each queue.
- Security - Each queue has strict ACLs which controls which users can submit jobs to individual queues. Also, there are safeguards to ensure that users cannot view and/or modify jobs from other users if so desired. Also, per-queue and system administrator roles are supported.
- Elasticity - Free resources can be allocated to any queue beyond its capacity. When there is demand for these resources from queues running below capacity at a future point in time, as tasks scheduled on these resources complete, they will be assigned to jobs on queues running below the capacity. This ensures that resources are available in a predictable and elastic manner to queues, thus preventing artificial silos of resources in the cluster, which helps utilization.
- Multi-tenancy - Comprehensive set of limits are provided to prevent a single job, user and queue from monopolizing resources of the queue or the cluster as a whole to ensure that the system, particularly the JobTracker, isn't overwhelmed by too many tasks or jobs.
- Operability - The queue definitions and properties can be changed, at runtime, by administrators to minimize service disruption. Also, a console is provided for users and administrators to view current allocation of resources to various queues in the system.
- Resource-based Scheduling - Support for resource-intensive jobs, wherein a job can optionally specify higher resource-requirements than the default, there-by accommodating applications with differing resource requirements. Currently, memory is the resource requirement supported.
- Job Priorities - Queues optionally support job priorities (disabled by default). Within a queue, jobs with higher priority will have access to the queue's resources before jobs with lower priority. However, once a job is running, it will not be preempted for a higher priority job, preemption is on the roadmap is currently not supported.
The Capacity Scheduler is available as a JAR file in the Hadoop tarball under the contrib/capacity-scheduler directory. The name of the JAR file would be on the lines of hadoop-*-capacity-scheduler.jar.
You can also build the Scheduler from source by executing ant package, in which case it would be available under build/contrib/capacity-scheduler.
To run the Capacity Scheduler in your Hadoop installation, you need to put it on the CLASSPATH. The easiest way is to copy the hadoop-*-capacity-scheduler.jar from to HADOOP_HOME/lib. Alternatively, you can modify HADOOP_CLASSPATH to include this jar, in conf/hadoop-env.sh.
Configuration
Using the Capacity Scheduler
To make the Hadoop framework use the Capacity Scheduler, set up the following property in the site configuration:
|
Property |
Value |
|
mapred.jobtracker.taskScheduler |
org.apache.hadoop.mapred.CapacityTaskScheduler |
Setting up queues
You can define multiple queues to which users can submit jobs with the Capacity Scheduler. To define multiple queues, you should use the mapred.queue.names property in conf/hadoop-site.xml.
The Capacity Scheduler can be configured with several properties for each queue that control the behavior of the Scheduler. This configuration is in the conf/capacity-scheduler.xml.
You can also configure ACLs for controlling which users or groups have access to the queues in conf/mapred-queue-acls.xml.
Queue properties
Resource allocation
The properties defined for resource allocations to queues and their descriptions are listed in below:
|
Name |
Description |
|
mapred.capacity-scheduler.queue.<queue-name>.capacity |
Percentage of the number of slots in the cluster that are made to be available for jobs in this queue. The sum of capacities for all queues should be less than or equal 100. |
|
mapred.capacity-scheduler.queue.<queue-name>.maximum-capacity |
Defines a limit beyond which a queue cannot use the capacity of the cluster. This provides a means to limit how much excess capacity a queue can use. By default, there is no limit. The maximum-capacity of a queue can only be greater than or equal to its minimum capacity. Default value of -1 implies a queue can use complete capacity of the cluster. This property could be to curtail certain jobs, which are long running in nature from occupying more than a certain percentage of the cluster, which in the absence of pre-emption, could lead to capacity guarantees of other queues being affected. One important thing to note is that maximum-capacity is a percentage, so based on the cluster's capacity it would change. So if large no of nodes or racks get added to the cluster, maximum Capacity in absolute terms would increase accordingly. |
|
mapred.capacity-scheduler.queue.<queue-name>.minimum-user-limit-percent |
Each queue enforces a limit on the percentage of resources allocated to a user at any given time, if there is competition for them. This user limit can vary between a minimum and maximum value. The former depends on the number of users who have submitted jobs, and the latter is set to this property value. For example, suppose the value of this property is 25. If two users have submitted jobs to a queue, no single user can use more than 50% of the queue resources. If a third user submits a job, no single user can use more than 33% of the queue resources. With 4 or more users, no user can use more than 25% of the queue's resources. A value of 100 implies no user limits are imposed. |
|
mapred.capacity-scheduler.queue.<queue-name>.user-limit-factor |
The multiple of the queue capacity, which can be configured to allow a single user to acquire more slots. By default this is set to 1, which ensures that a single user can never take more than the queue's configured capacity irrespective of how idle the cluster is. |
|
mapred.capacity-scheduler.queue.<queue-name>.supports-priority |
If true, priorities of jobs will be taken into account in scheduling decisions. |
Job initialization
Capacity scheduler lazily initializes the jobs before they are scheduled, for reducing the memory footprint on JobTracker. The following are the parameters control the initialization of jobs per-queue:
|
Name |
Description |
|
mapred.capacity-scheduler.maximum-system-jobs |
Maximum number of jobs in the system, which can be initialized, concurrently, by the Capacity Scheduler. Individual queue limits on initialized jobs are directly proportional to their queue capacities. |
|
mapred.capacity-scheduler.queue.<queue-name>.maximum-initialized-active-tasks |
The maximum number of tasks, across all jobs in the queue, which can be initialized concurrently. Once the queue's jobs exceed this limit they will be queued on disk. |
|
mapred.capacity-scheduler.queue.<queue-name>.maximum-initialized-active-tasks-per-user |
The maximum number of tasks per-user, across all the of the user's jobs in the queue, which can be initialized concurrently. Once the user's jobs exceed this limit they will be queued on disk. |
|
mapred.capacity-scheduler.queue.<queue-name>.init-accept-jobs-factor |
The multiple of (maximum-system-jobs * queue-capacity) used to determine the number of jobs, which are accepted by the scheduler. The default value is 10. If number of jobs submitted to the queue exceeds this limit, job submission are rejected. |
Resource based scheduling
The Capacity Scheduler supports scheduling of tasks on a TaskTracker (TT) based on a job's memory requirements in terms of RAM and Virtual Memory (VMEM) on the TT node. A TT is conceptually composed of a fixed number of map and reduce slots with fixed slot size across the cluster. A job can ask for one or more slots for each of its component map and/or reduce slots. If a task consumes more memory than configured the TT forcibly kills the task.
Currently the memory based scheduling is only supported in Linux platform.
Additional scheduler-based configuration parameters are as follows:
|
Name |
Description |
|
mapred.cluster.map.memory.mb |
The size, in terms of virtual memory, of a single map slot in the Map-Reduce framework, used by the scheduler. A job can ask for multiple slots for a single map task via mapred.job.map.memory.mb, up to the limit specified by mapred.cluster.max.map.memory.mb, if the scheduler supports the feature. The value of -1 indicates that this feature is turned off. |
|
mapred.cluster.reduce.memory.mb |
The size, in terms of virtual memory, of a single reduce slot in the Map-Reduce framework, used by the scheduler. A job can ask for multiple slots for a single reduce task via mapred.job.reduce.memory.mb, up to the limit specified by mapred.cluster.max.reduce.memory.mb, if the scheduler supports the feature. The value of -1 indicates that this feature is turned off. |
|
mapred.cluster.max.map.memory.mb |
The maximum size, in terms of virtual memory, of a single map task launched by the Map-Reduce framework, used by the scheduler. A job can ask for multiple slots for a single map task via mapred.job.map.memory.mb, up to the limit specified by mapred.cluster.max.map.memory.mb, if the scheduler supports the feature. The value of -1 indicates that this feature is turned off. |
|
mapred.cluster.max.reduce.memory.mb |
The maximum size, in terms of virtual memory, of a single reduce task launched by the Map-Reduce framework, used by the scheduler. A job can ask for multiple slots for a single reduce task via mapred.job.reduce.memory.mb, up to the limit specified by mapred.cluster.max.reduce.memory.mb, if the scheduler supports the feature. The value of -1 indicates that this feature is turned off. |
|
mapred.job.map.memory.mb |
The size, in terms of virtual memory, of a single map task for the job. A job can ask for multiple slots for a single map task, rounded up to the next multiple of mapred.cluster.map.memory.mb and up to the limit specified by mapred.cluster.max.map.memory.mb, if the scheduler supports the feature. The value of -1 indicates that this feature is turned off iff mapred.cluster.map.memory.mb is also turned off (-1). |
|
mapred.job.reduce.memory.mb |
The size, in terms of virtual memory, of a single reduce task for the job. A job can ask for multiple slots for a single reduce task, rounded up to the next multiple of mapred.cluster.reduce.memory.mb and up to the limit specified by mapred.cluster.max.reduce.memory.mb, if the scheduler supports the feature. The value of -1 indicates that this feature is turned off ifff mapred.cluster.reduce.memory.mb is also turned off (-1). |
Reviewing the configuration of the Capacity Scheduler
Once the installation and configuration is completed, you can review it after starting the MapReduce cluster from the admin UI.
- Start the MapReduce cluster as usual.
- Open the JobTracker web UI.
- The queues you have configured should be listed under the Scheduling Information section of the page.
- The properties for the queues should be visible in the Scheduling Information column against each queue.
- The /scheduler web page should show the resource usages of individual queues.
Example
Here is a practical example for using Capacity Scheduler:
|
<?xml version="1.0"?> |
Nokia announced today that they will adopt Windows Phone as the primary operating system for its future smart phones.Coming from the world's largest mobile phone manufacturer, this is a historic announcement.
While the vast majority of PC malware is written for Windows, Windows Phone 7 is a entirely different ballgame.
The security model of Windows Phone 7 is quite different from Windows XP/Vista/7/et cetera, and includes features such as Application Certification, Isolated Storage, and Application Isolation. For example, third party applications can not run in the background because of security concerns.
Windows Phone 7 and XBOX are the only Microsoft platforms where applications must be pre-approved by Microsoft before users can run them.
As a result, we don't expect any major mobile malware outbreaks just because of Nokia's partnership.
On 11/02/11 At 11:44 AM
I spend a lot of time traveling all around the world talking about Cloud Computing. Often I end up talking to IT and financial folks who want to have a discussion about cost, usually it goes along the lines of “But cloud WILL save me money won’t it?”
I’m kind of against these sort of black and white discussions, mainly because they miss the real point of Cloud Computing, that is the value that cloud can bring to an organization. I like to think that the discussion around Cloud Computing and cost should instead be had around the broader issues related to Cloudonomics (a term invented by one of the real thought leaders in the space, Joe Weinman). So, what impacts does Cloud Computing have on an organization? Well, as we see it, cloud impacts in the following ways;
- It lowers the opportunity cost of running technology
- It allows for a shift from capital expenditure to operating expenditure
- It lowers the total cost of ownership(TCO) of technology
- It gives organizations the ability to add business value by renewed focus on core activities
As an aside, there’s a really interesting conversation going on on Focus.com discussing total IT spending – the consensus among most of us is that, while the unit price for IT will reduce because of the cloud, total IT expenditure will not but, in a nice segue back into my paper, this is actually a good thing as that extra IT spend goes directly into value adding services.
The issue here is that traditional IT expenditure has been skewed towards just “keeping the lights on.” The diagram below depicts Gartner findings that 80% of IT expenditure is non value-adding. If we remove that 80% inefficiency, and replace it with services where every penny of spend directly relates to core business, we enable a step change in the way IT departments work, and the value they drive for the organization.

In the paper we talk around several factors of Cloudonomics – the move from CapEx to OpEx, Total Cost of Ownership and Strategic Focus. We include a great case study about a fashion retailer that moved all their point of sale service to a SaaS solution from VendHQ, and in doing so saved time and money but the key thing for me here is the direct relationship that Cloud Computing introduces between cause and effect. With Cloud the equation is simple – spend can be directly related to bottom line business benefits.
There’s a lot more information in the paper itself – you can download it here. I’d also encourage you to register for the second CloudU webinar – on this webinar I’ll be joined by Bernard Golden CEO of Hyperstratus and blogger for CIO.com and also Lew Moorman, President of Rackspace’s Cloud Division. It’s going to be a really interesting session – come and join in.







Technorati Tags: